Building a Chatbot with OpenAI API
HOW TO GUIDES Dec. 12, 2025, 5:30 p.m.

Building a Chatbot with OpenAI API

Welcome to the world of conversational AI! In this guide, we’ll walk you through building a fully functional chatbot powered by the OpenAI API, using Python and a few handy libraries. By the end, you’ll have a reusable codebase, understand how to handle streaming responses, and be ready to embed your bot into web apps, Slack, or even voice assistants.

Understanding the OpenAI API Landscape

The OpenAI API offers several models—gpt‑3.5‑turbo, gpt‑4, and specialized embeddings—that differ in cost, latency, and capability. For most chat applications, gpt‑3.5‑turbo strikes a sweet balance between speed and price, while gpt‑4 shines when you need deeper reasoning or nuanced language.

All requests are HTTP POST calls to https://api.openai.com/v1/chat/completions. The payload includes a messages array that mimics a conversation: each entry has a role (system, user, or assistant) and content. The API returns a JSON object with the model’s reply, token usage, and optional metadata.

Key Concepts to Keep in Mind

  • Tokens: Roughly 4 characters per token for English text. Knowing token limits helps you truncate history and stay within model constraints.
  • System Prompt: Sets the bot’s persona and behavior. A well‑crafted system prompt reduces the need for repetitive instructions.
  • Temperature & Top‑P: Control randomness. Lower temperature makes responses deterministic; higher values encourage creativity.
Pro tip: Start with a concise system prompt (1‑2 sentences). You can always refine it later based on user feedback.

Setting Up Your Development Environment

First, ensure you have Python 3.9+ installed. We’ll use pip to install openai and python-dotenv for secure API key handling.

# Install required packages
pip install openai python-dotenv

Create a .env file at the root of your project and store your secret key as OPENAI_API_KEY=sk-…. This keeps credentials out of source control.

# .env
OPENAI_API_KEY=sk-your-secret-key-here

Load the environment variables in your script using dotenv:

import os
from dotenv import load_dotenv

load_dotenv()
openai_api_key = os.getenv("OPENAI_API_KEY")

Authenticating and Making a Simple Request

With the key loaded, initialize the OpenAI client and send a minimal chat request. This example demonstrates a basic “echo” bot that repeats what the user says.

import openai

openai.api_key = openai_api_key

def simple_echo(user_input: str) -> str:
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You are a helpful assistant that repeats the user's input verbatim."},
            {"role": "user", "content": user_input}
        ],
        temperature=0.0  # deterministic output
    )
    return response.choices[0].message["content"].strip()

# Demo
print(simple_echo("Hello, OpenAI!"))

Run the script and you should see the exact same phrase echoed back. While trivial, this pattern—building a messages list and extracting choices[0].message—is the foundation for every chatbot you’ll create.

Handling Errors Gracefully

  • Catch openai.error.RateLimitError to back off when you hit request limits.
  • Use try/except blocks around API calls to surface useful debug information.
from openai.error import OpenAIError

def safe_chat(messages):
    try:
        return openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=messages)
    except OpenAIError as e:
        print(f"OpenAI API error: {e}")
        return None
Pro tip: Implement exponential backoff (e.g., 1s, 2s, 4s) for rate‑limit retries to avoid hammering the endpoint.

Building a Stateful Chatbot

A real chatbot must remember the conversation history. We’ll store messages in a list, prune old entries when we approach the token limit, and expose a simple REPL loop for interactive testing.

import tiktoken  # pip install tiktoken

MAX_TOKENS = 4096
RESERVED_TOKENS = 500  # buffer for the model's reply

def token_count(messages):
    enc = tiktoken.encoding_for_model("gpt-3.5-turbo")
    total = 0
    for msg in messages:
        total += len(enc.encode(msg["content"]))
    return total

def trim_history(messages):
    while token_count(messages) > (MAX_TOKENS - RESERVED_TOKENS):
        # Remove the oldest user‑assistant pair (skip system prompt)
        messages.pop(1)  # remove first user message after system
        messages.pop(1)  # remove the corresponding assistant reply
    return messages

def chat_loop():
    history = [
        {"role": "system", "content": "You are a friendly AI assistant that helps users with programming questions."}
    ]
    while True:
        user_input = input("You: ")
        if user_input.lower() in {"exit", "quit"}:
            break
        history.append({"role": "user", "content": user_input})
        history = trim_history(history)

        response = safe_chat(history)
        if response:
            assistant_msg = response.choices[0].message["content"]
            print(f"Bot: {assistant_msg}")
            history.append({"role": "assistant", "content": assistant_msg})

Run chat_loop() and you’ll have a minimal, stateful chatbot that respects token limits. The tiktoken library gives an accurate token count, which is crucial for avoiding “max token” errors.

Streaming Responses for Real‑Time Interaction

Waiting for the entire response can feel sluggish, especially with longer answers. OpenAI supports stream=True, which yields partial tokens as they’re generated. Let’s adapt the previous example to stream text to the console.

def stream_chat(messages):
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=messages,
        temperature=0.7,
        stream=True
    )
    collected = ""
    for chunk in response:
        delta = chunk["choices"][0]["delta"]
        if "content" in delta:
            token = delta["content"]
            print(token, end="", flush=True)
            collected += token
    print()  # newline after streaming ends
    return collected

def chat_with_stream():
    history = [
        {"role": "system", "content": "You are a witty assistant that explains concepts in plain English."}
    ]
    while True:
        user_input = input("\nYou: ")
        if user_input.lower() in {"exit", "quit"}:
            break
        history.append({"role": "user", "content": user_input})
        history = trim_history(history)

        print("Bot: ", end="", flush=True)
        assistant_reply = stream_chat(history)
        history.append({"role": "assistant", "content": assistant_reply})

Now the bot prints each token as it arrives, creating a more natural, “typing” experience. This pattern works well for web sockets, Discord bots, or any UI where you want to show progress.

Pro tip: When streaming to a web UI, buffer tokens in groups of 5‑10 before pushing to the client to reduce network chatter.

Advanced Feature: Function Calling

OpenAI’s function calling lets the model invoke predefined Python functions, turning vague user requests into structured data. Imagine a bot that can fetch the current weather or look up a stock price without you writing custom parsing logic.

Defining Functions for the Model

import json
import requests

def get_weather(city: str) -> dict:
    """Return a mock weather payload for the given city."""
    # In production, replace with a real API call.
    return {
        "city": city,
        "temperature_c": 22,
        "description": "Partly cloudy"
    }

# Describe the function to the model
weather_function_schema = {
    "name": "get_weather",
    "description": "Fetches current weather for a city",
    "parameters": {
        "type": "object",
        "properties": {
            "city": {"type": "string", "description": "Name of the city"}
        },
        "required": ["city"]
    }
}

When you send a request, include the functions field. If the model decides to call the function, it returns a function_call object instead of a regular message.

def chat_with_function(user_query):
    messages = [
        {"role": "system", "content": "You are a helpful assistant that can provide weather updates."},
        {"role": "user", "content": user_query}
    ]

    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo-0613",
        messages=messages,
        functions=[weather_function_schema],
        function_call="auto"  # let the model decide
    )

    # Check if a function call was suggested
    if response.choices[0].message.get("function_call"):
        function_name = response.choices[0].message["function_call"]["name"]
        arguments = json.loads(response.choices[0].message["function_call"]["arguments"])
        if function_name == "get_weather":
            result = get_weather(**arguments)
            # Feed the result back to the model
            messages.append(response.choices[0].message)  # original function call
            messages.append({
                "role": "function",
                "name": function_name,
                "content": json.dumps(result)
            })
            final_resp = openai.ChatCompletion.create(
                model="gpt-3.5-turbo-0613",
                messages=messages
            )
            return final_resp.choices[0].message["content"]
    else:
        return response.choices[0].message["content"]

# Demo
print(chat_with_function("What's the weather like in Berlin?"))

The model first decides it needs the weather, calls get_weather, and then uses the returned JSON to craft a natural language answer. This two‑step flow eliminates fragile string parsing on your side.

When to Use Function Calling

  • Fetching live data (stock prices, weather, sports scores).
  • Performing CRUD operations on a database without exposing raw SQL to the model.
  • Generating structured outputs like JSON schemas, calendars, or itineraries.
Pro tip: Keep function schemas as small as possible. Overly complex schemas confuse the model and increase the chance of malformed calls.

Real‑World Use Cases

Customer Support Bot: Integrate with a ticketing system (e.g., Zendesk) by exposing functions like create_ticket or search_articles. The bot can ask clarifying questions, fetch relevant KB articles, and automatically open tickets when needed.

Learning Companion: Pair the chatbot with a spaced‑repetition backend. The model can quiz users, evaluate answers, and call a schedule_review function to store the next review date.

Code Review Assistant: Use the model to suggest improvements, then call a apply_patch function that writes a diff to a Git repository. This creates a seamless “AI‑powered pull request” workflow.

Deploying Your Bot to a Web Service

For production, you’ll likely expose the chatbot via an HTTP endpoint. Flask is a lightweight choice; FastAPI offers async support and automatic OpenAPI docs. Below is a minimal Flask app that wraps the chat_with_stream logic into a JSON API.

from flask import Flask, request, jsonify
from flask_cors import CORS

app = Flask(__name__)
CORS(app)  # allow cross‑origin requests for front‑ends

# Shared conversation store (in‑memory for demo)
conversation_history = [
    {"role": "system", "content": "You are a helpful assistant."}
]

@app.route("/chat", methods=["POST"])
def chat_endpoint():
    data = request.get_json()
    user_msg = data.get("message")
    if not user_msg:
        return jsonify({"error": "No message provided"}), 400

    conversation_history.append({"role": "user", "content": user_msg})
    conversation_history = trim_history(conversation_history)

    # Use streaming version but collect into a string for JSON response
    assistant_reply = ""
    for chunk in openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=conversation_history,
        stream=True
    ):
        delta = chunk["choices"][0]["delta"]
        if "content" in delta:
            assistant_reply += delta["content"]

    conversation_history.append({"role": "assistant", "content": assistant_reply})
    return jsonify({"reply": assistant_reply})

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000, debug=True)

Deploy this Flask app to a platform like Render, Railway, or a Docker container on AWS ECS. Remember to set OPENAI_API_KEY as an environment variable in your hosting environment.

Testing, Monitoring, and Cost Management

Testing should cover both happy‑path conversations and edge cases where the model may hallucinate or return empty responses. Unit tests can mock openai.ChatCompletion.create using unittest.mock.

from unittest.mock import patch

@patch("openai.ChatCompletion.create")
def test_simple_echo(mock_create):
    mock_create.return_value = {
        "choices": [{"message": {"content": "Hello, world!"}}]
    }
    assert simple_echo("Hello, world!") == "Hello, world!"

Monitoring involves tracking token usage (available in the API response) and alerting when daily spend exceeds a threshold. You can push metrics to Prometheus or a simple CloudWatch alarm.

Pro tip: Set a hard budget in the OpenAI dashboard and enable usage alerts. Combine this with programmatic checks that halt requests once a daily quota is hit.

Security and Privacy Considerations

Never log raw user inputs or model outputs in production logs unless you have explicit consent. If you store conversation history for personalization, encrypt it at rest and purge it after a reasonable retention period.

When using function calling, validate arguments before executing any code. Even though the model is well‑behaved, a malicious user could craft a prompt that tricks the model into calling a function with unexpected parameters.

Scaling Your Chatbot

For high‑traffic bots, consider the following strategies:

  1. Connection Pooling: Reuse HTTP sessions with requests.Session to reduce TLS handshake overhead.
  2. Batching: If multiple users send messages simultaneously, you can batch them into a single API call using parallelism, but keep each conversation’s context isolated.
  3. Cache Frequent Responses: Cache answers to common FAQs for a few minutes to cut down on token usage.

Serverless platforms (AWS Lambda, Vercel) are great for bursty traffic, but be mindful of cold‑start latency

Share this article