Building AI Apps with Vercel AI SDK
Vercel’s AI SDK has quickly become the go‑to toolkit for developers who want to sprinkle intelligent features into their web apps without wrestling with complex infrastructure. In this guide we’ll walk through the core concepts, set up a simple chat‑bot, and then scale it to a real‑world use case like image generation. By the end you’ll have a solid, production‑ready foundation you can adapt to any AI‑driven product.
Why Choose Vercel AI SDK?
The SDK abstracts away the boilerplate of calling large language models (LLMs) and multimodal APIs. It ships with built‑in streaming, caching, and edge‑function support, which means you get low latency and cheap compute out of the box. Moreover, it integrates seamlessly with Next.js, Vercel’s serverless platform, and the new ai route handlers.
Another hidden gem is the type‑safe request/response schema that lets you catch mismatches during development rather than at runtime. This safety net is especially valuable when you start chaining multiple AI services together.
Pro tip: Enable Vercel’s Edge Config for your API keys. It keeps secrets out of the client bundle while still delivering sub‑millisecond lookups at the edge.
Getting Started: Project Setup
First, create a fresh Next.js app and install the SDK:
npx create-next-app@latest my-ai-app
cd my-ai-app
npm i @vercel/ai
Next, add your OpenAI (or Anthropic, Cohere, etc.) API key to Vercel’s environment variables. In the Vercel dashboard, go to Project Settings → Environment Variables and set OPENAI_API_KEY. The SDK will automatically pick it up from process.env.
Creating a Basic Chat Endpoint
Inside app/api/chat/route.js (or .ts if you prefer TypeScript), we’ll define an edge‑function that streams responses back to the client.
import { createAI } from "@vercel/ai";
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
export const runtime = "edge";
export async function POST(req) {
const { messages } = await req.json();
const ai = createAI(openai);
const stream = await ai.chat.completions.create({
model: "gpt-4o-mini",
messages,
stream: true,
});
return new Response(stream, {
headers: { "Content-Type": "text/event-stream" },
});
}
This endpoint expects a JSON payload with a messages array (the same shape you’d send to OpenAI directly). The SDK handles streaming, so each token arrives as soon as the model generates it.
Client‑Side Hook
On the front end, Vercel provides a handy React hook that abstracts the fetch and streaming logic. Create components/ChatBox.jsx:
import { useChat } from "@vercel/ai/react";
export default function ChatBox() {
const { messages, input, setInput, handleSubmit, isLoading } = useChat({
api: "/api/chat",
});
return (
<div className="chat-container">
<ul className="messages">
{messages.map((m, i) => (
<li key={i} className={m.role}>{m.content}</li>
))}
</ul>
<form onSubmit={handleSubmit}>
<input
value={input}
onChange={(e) => setInput(e.target.value)}
placeholder="Ask anything..."
disabled={isLoading}
/>
<button type="submit" disabled={isLoading}>
{isLoading ? "Thinking…" : "Send"}
</button>
</form>
</div>
);
}
The hook automatically appends the user’s message to the messages array, sends it to the edge endpoint, and streams the assistant’s reply back into the UI. All you need is a minimal CSS file to make it look decent.
Pro tip: Wrap theChatBoxin aReact.Suspenseboundary to show a fallback while the first response loads. This prevents layout shift on slower connections.
Beyond Text: Adding Image Generation
Many modern AI apps combine text and visuals—think product mockups, AI‑generated memes, or design assistants. Vercel AI SDK supports multimodal models like OpenAI’s dall-e-3 without any extra plumbing.
Server‑Side Image Route
Let’s add an endpoint that receives a prompt and returns a generated image URL.
import { createAI } from "@vercel/ai";
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
export const runtime = "edge";
export async function POST(req) {
const { prompt } = await req.json();
const ai = createAI(openai);
const result = await ai.images.generate({
model: "dall-e-3",
prompt,
n: 1,
size: "1024x1024",
});
// OpenAI returns an array of URLs
const imageUrl = result.data[0].url;
return new Response(JSON.stringify({ imageUrl }), {
headers: { "Content-Type": "application/json" },
});
}
Notice the runtime = "edge" directive again—this keeps the latency low, even for heavy image generation calls. The SDK abstracts the multipart response handling, giving you a clean JSON payload.
Integrating with the UI
We’ll extend ChatBox to support a special /image command. When the user types /image sunset over mountains, the client calls the new endpoint and displays the picture inline.
import { useChat } from "@vercel/ai/react";
import { useState } from "react";
export default function ChatBox() {
const { messages, input, setInput, handleSubmit, isLoading } = useChat({
api: "/api/chat",
});
const [imageUrl, setImageUrl] = useState(null);
async function handleImageCommand(prompt) {
const res = await fetch("/api/image", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ prompt }),
});
const { imageUrl } = await res.json();
setImageUrl(imageUrl);
}
async function onSubmit(e) {
e.preventDefault();
if (input.startsWith("/image")) {
const prompt = input.replace("/image", "").trim();
await handleImageCommand(prompt);
setInput("");
} else {
await handleSubmit(e);
}
}
return (
<div className="chat-container">
<ul className="messages">
{messages.map((m, i) => (
<li key={i} className={m.role}>{m.content}</li>
))}
{imageUrl && (
<li className="assistant">
<img src={imageUrl} alt="AI generated" width={256} />
</li>
)}
</ul>
<form onSubmit={onSubmit}>
<input
value={input}
onChange={(e) => setInput(e.target.value)}
placeholder="Ask or /image prompt..."
disabled={isLoading}
/>
<button type="submit" disabled={isLoading}>
{isLoading ? "Thinking…" : "Send"}
</button>
</form>
</div>
);
}
This pattern keeps the chat flow intact while allowing users to request visual content on demand. Because the image endpoint runs at the edge, the generated picture typically appears within 2‑3 seconds—a smooth experience for most users.
Pro tip: Cache image URLs in Edge Config for 24 hours. Subsequent identical prompts will hit the cache instantly, saving both time and API credits.
Advanced Patterns: Chaining Multiple AI Services
Real‑world applications often need more than a single model. A common scenario is using an LLM to parse user intent, then delegating to a specialized model (e.g., a summarizer, a code‑generator, or a sentiment analyzer). The Vercel AI SDK makes this orchestration trivial.
Intent Detection + Summarization
Suppose you’re building a knowledge‑base assistant. First, the LLM decides whether the user asks a factual question or wants a summary of a long article. If it’s a summary request, we forward the article text to a dedicated summarizer model.
import { createAI } from "@vercel/ai";
import OpenAI from "openai";
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
export const runtime = "edge";
export async function POST(req) {
const { messages } = await req.json();
const ai = createAI(openai);
// Step 1: Intent detection
const intentRes = await ai.chat.completions.create({
model: "gpt-4o-mini",
messages,
temperature: 0,
max_tokens: 50,
// Simple prompt engineering to extract intent
system: "You are an intent classifier. Reply with ONE word: SUMMARY or QUESTION.",
});
const intent = intentRes.choices[0].message.content.trim();
if (intent === "SUMMARY") {
// Step 2: Summarize the last user message (assumed to be the article)
const article = messages[messages.length - 1].content;
const summaryRes = await ai.chat.completions.create({
model: "gpt-4o-mini",
messages: [
{ role: "system", content: "Summarize the following text in 3 sentences." },
{ role: "user", content: article },
],
});
const summary = summaryRes.choices[0].message.content.trim();
return new Response(JSON.stringify({ reply: summary }), {
headers: { "Content-Type": "application/json" },
});
}
// Fallback: regular answer
const answerRes = await ai.chat.completions.create({
model: "gpt-4o-mini",
messages,
});
const answer = answerRes.choices[0].message.content.trim();
return new Response(JSON.stringify({ reply: answer }), {
headers: { "Content-Type": "application/json" },
});
}
The flow runs entirely on the edge, keeping latency low even though we make two separate calls to the LLM. The SDK’s createAI instance can be reused across calls, which reduces the overhead of re‑initializing the client.
Streaming Combined Responses
If you want to stream the final answer while still performing the intent check, you can wrap the logic in a generator. The SDK’s stream flag works with any async iterator, so you can yield partial results as soon as they’re ready.
export async function POST(req) {
const { messages } = await req.json();
const ai = createAI(openai);
// Intent detection (non‑streaming)
const intent = await ai.chat.completions.create({
model: "gpt-4o-mini",
messages,
temperature: 0,
max_tokens: 1,
system: "Reply with SUMMARY or QUESTION.",
}).then(r => r.choices[0].message.content.trim());
// Choose the appropriate stream
const stream = await ai.chat.completions.create({
model: "gpt-4o-mini",
messages: intent === "SUMMARY"
? [{ role: "system", content: "Summarize in 3 sentences." }, messages[messages.length-1]]
: messages,
stream: true,
});
return new Response(stream, { headers: { "Content-Type": "text/event-stream" } });
}
Clients will see a seamless typing effect regardless of which branch the logic took. This pattern scales nicely when you add more branches (e.g., code generation, translation, or data extraction).
Pro tip: Use temperature: 0 for classification steps. It forces deterministic output, making downstream branching reliable.
Deploying and Monitoring
With the code ready, push it to your Git repository and connect the project to Vercel. The platform automatically detects the runtime = "edge" flag and provisions edge functions for you. A single click on “Deploy” spins up a globally distributed instance.
Monitoring AI usage is crucial for cost control. Vercel’s built‑in analytics show request counts per edge region, while the OpenAI dashboard provides token usage. Combine both by adding a lightweight middleware that logs requestId, model, and tokenCount to a Vercel Log Drain or an external observability tool like Datadog.
export async function middleware(req) {
const start = Date.now();
const response = await NextResponse.next();
const duration = Date.now() - start;
// Example: send to a log endpoint
fetch("https://logs.example.com/ai", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
path: req.nextUrl.pathname,
method: req.method,
duration,
requestId: req.headers.get("x-vercel-id"),
}),
});
return response;
}
By instrumenting at the edge, you capture latency before it reaches the client, giving you a true picture of performance.
Best Practices for Production‑Ready AI Apps
- Cache wisely: Use
Edge ConfigorVercel KVto store frequent prompts and their responses. - Rate limit per user: Prevent abuse by throttling calls based on IP or auth token.
- Sanitize user input: Even though LLMs are tolerant, malicious prompts can cause unwanted model behavior.
- Graceful fallback: Provide a static answer or a “try again later” message when the AI service is unavailable.
- Version your prompts: Store prompt templates in a separate file or CMS so you can iterate without redeploying.
Pro tip: Keep your system messages short and focused. Overly verbose system prompts increase token usage and can dilute the model’s attention on the actual user query.
Real‑World Use Cases
Customer Support Chatbot – Combine intent detection with a knowledge‑base lookup. The edge function first classifies the request, then either pulls an FAQ answer from a CMS or hands off to a human agent via Slack webhook.
Design Assistant – Users describe a UI component, the app calls the image generation endpoint, and then an LLM refines the description into Tailwind CSS code. This loop can be fully streamed, giving designers instant visual feedback.
Content Summarizer for Newsletters – A daily cron job fetches the top headlines, sends each article to the summarizer endpoint, and compiles the results into an email. Because the SDK runs at the edge, the entire pipeline finishes within minutes, even for dozens of articles.
Conclusion
Vercel AI SDK bridges the gap between powerful generative models and the fast, scalable world of edge computing. By leveraging built‑in streaming, caching, and type safety, you can ship AI features that feel native