OpenHands: Open Source AI Software Engineer Agent
Imagine a software engineer that never sleeps, can read a repository in seconds, and suggests production‑ready code on demand. That’s the promise behind OpenHands, an open‑source AI software engineer agent that blends large language models with a lightweight execution environment. In this article we’ll explore how OpenHands works under the hood, walk through a real‑world setup, and showcase practical code snippets you can drop into your own projects.
What Is OpenHands?
OpenHands is a community‑driven project that turns a large language model (LLM) into an autonomous coding assistant. Unlike traditional chat‑based assistants, OpenHands can act on a codebase: it can run tests, modify files, and even commit changes back to version control. The core idea is to give the model a sandboxed “hand” that can execute commands, read outputs, and iterate until a task is satisfied.
The architecture consists of three loosely coupled components:
- LLM Engine – any compatible model (e.g., LLaMA‑2, Mistral) that generates natural‑language and code responses.
- Executor – a Docker‑based sandbox that runs shell commands, interprets test failures, and returns structured feedback.
- Orchestrator – the glue that translates user intents into a loop of “think → act → observe → revise”.
This separation lets developers swap out the model or the execution environment without rewriting the whole system, a flexibility that’s rare in proprietary AI coding tools.
Getting Started: Installation in Five Minutes
OpenHands ships as a Python package with a single entry point openhands. The quickest way to try it out is to use the provided Docker image, which bundles the executor and a lightweight LLM server.
# Install the Python client
pip install openhands
# Pull the Docker image (requires Docker Desktop)
docker pull openhands/engine:latest
# Start the engine in the background
docker run -d --name openhands-engine -p 8000:8000 openhands/engine:latest
Once the engine is running, you can launch the interactive CLI:
openhands chat
The CLI will greet you with a prompt where you can describe a task, such as “Add unit tests for the calculate_tax function”. OpenHands will then spin up a temporary container, edit the repository, run the tests, and show you the diff.
A First‑Hand Example: Generating a CRUD API
Let’s see OpenHands in action with a concrete example. Suppose you have a fresh FastAPI project and you need a CRUD endpoint for a Book model. Here’s how you can ask the agent to scaffold it.
# Step 1: Open a terminal and start the chat
openhands chat
# Step 2: Describe the task
User: Create a FastAPI router with CRUD operations for a Book model
(fields: id, title, author, published_year). Use SQLModel for ORM.
# OpenHands will respond with a series of actions:
# 1️⃣ Create models.py
# 2️⃣ Create routers/book.py
# 3️⃣ Update main.py to include the router
# 4️⃣ Add a simple SQLite database connection
# 5️⃣ Run `uvicorn` to verify the app starts
After a few seconds, OpenHands presents a git diff view. You can accept the changes, reject them, or ask for refinements. The entire workflow feels like collaborating with a junior developer who never complains about deadlines.
What the Agent Actually Did
- Generated
models.pywith aBookclass inheriting fromSQLModel. - Wrote a
book_routercontainingGET /books,POST /books,PUT /books/{id}, andDELETE /books/{id}handlers. - Inserted the router into
main.pyand added acreate_enginecall. - Executed
uvicorn main:app --reloadinside the sandbox to confirm the server starts without errors.
Pro tip: When you ask OpenHands to create new files, prepend the instruction with “Use the src/ directory structure”. The agent respects the path and keeps your repo tidy.
Debugging with OpenHands: Fixing Failing Tests
Beyond generation, OpenHands excels at iterative debugging. Consider a repository where a recent refactor broke a set of unit tests. Instead of manually tracing the stack, you can hand the problem to the agent.
# Inside the chat session
User: The test suite is failing with an AssertionError in test_calculate_tax.
Please investigate and fix the bug.
# OpenHands workflow:
# • Runs `pytest -q` in the sandbox.
# • Captures the failure output.
# • Opens the relevant source file.
# • Proposes a patch and re‑runs the tests.
# • Repeats until all tests pass.
In practice, the agent might suggest a one‑line change such as correcting a comparison operator or updating a rounding method. After each suggestion, you see a diff and a test summary, allowing you to approve or request another iteration.
Pro tip: Keep your test suite deterministic (e.g., avoid flaky network calls). OpenHands relies on repeatable test results to converge on a solution.
Integrating OpenHands into CI/CD Pipelines
For teams that want AI assistance on every pull request, OpenHands can be invoked as a GitHub Action. The action spins up the Docker engine, feeds the PR diff to the orchestrator, and posts a comment with suggested improvements.
# .github/workflows/openhands.yml
name: OpenHands Review
on:
pull_request:
types: [opened, synchronize]
jobs:
ai-review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up OpenHands
run: |
docker pull openhands/engine:latest
docker run -d --name openhands-engine -p 8000:8000 openhands/engine:latest
- name: Run OpenHands Review
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
pip install openhands
openhands ci --repo ${{ github.repository }} --pr ${{ github.event.pull_request.number }}
The openhands ci command sends the PR diff to the agent, which returns a markdown comment containing a concise summary, a list of code suggestions, and optionally a ready‑to‑apply patch. Developers can merge the patch with a single click, turning AI‑driven code review into a seamless part of the workflow.
Real‑World Use Cases
1. Rapid Prototyping – Start with a high‑level description (“Build a CLI that parses CSV files and outputs JSON”) and let OpenHands generate the skeleton, tests, and documentation in minutes.
2. Legacy Modernization – Feed the agent a codebase written in an older language (e.g., Python 2) and ask it to migrate modules to Python 3, updating syntax and dependencies automatically.
3. Learning & Onboarding – New hires can ask OpenHands to explain complex functions or to rewrite a piece of code in a more idiomatic style, accelerating the ramp‑up period.
Case Study: E‑Commerce Startup
- Problem: The team needed to add a new “discount code” feature across three microservices within a week.
- Solution: OpenHands generated the data model, API endpoints, and integration tests for each service based on a single specification.
- Result: Development time dropped from 5 days to 2 days, and the automatically generated tests caught a regression before deployment.
Customizing the LLM Backend
OpenHands is model‑agnostic. By default it uses a hosted Llama‑2 13B endpoint, but you can point it to any OpenAI‑compatible API or even run a local model with vLLM. The configuration lives in ~/.openhands/config.yaml.
# Example config.yaml
model:
provider: local
path: /models/llama-2-13b
temperature: 0.2
max_tokens: 1024
executor:
sandbox: docker
timeout_seconds: 30
After updating the file, restart the engine container. The orchestrator will now send prompts to your local model, giving you full control over privacy and latency.
Pro tip: For code‑heavy tasks, lower the temperature (e.g., 0.1) to encourage deterministic outputs. Higher temperatures are better for brainstorming or writing documentation.
Security Considerations
Because OpenHands executes code on your behalf, sandboxing is non‑negotiable. The Docker executor runs with a read‑only root filesystem and a limited network stack. However, you should still:
- Run the engine on a dedicated host or CI runner.
- Mount only the repository directory into the container.
- Enable resource limits (CPU, memory) to prevent runaway processes.
OpenHands also sanitizes generated code before committing, stripping out potentially dangerous system calls. Nonetheless, always review patches before merging, especially in security‑critical projects.
Extending OpenHands with Plugins
The project ships with a plugin framework that lets you add domain‑specific knowledge. For instance, a “SQL optimizer” plugin can analyze generated queries and suggest indexes.
# my_plugin.py
from openhands.plugins import BasePlugin
class SqlOptimizer(BasePlugin):
name = "sql_optimizer"
def after_file_write(self, path, content):
if path.endswith(".sql"):
# Simple heuristic: add an index on primary key
optimized = content + "\nCREATE INDEX IF NOT EXISTS idx_id ON my_table(id);"
return optimized
return content
Register the plugin by adding its path to the config file, and the orchestrator will invoke it automatically after each file generation step.
Performance Benchmarks
In the official benchmark suite, OpenHands completes a typical “add a new REST endpoint” task in 12.4 seconds on a machine with a single RTX 4090 GPU. That includes model inference, sandbox setup, test execution, and diff generation. By comparison, a human developer averages 8–12 minutes for the same task, highlighting the productivity boost.
Latency is primarily driven by model loading and container startup. Caching the Docker image and keeping the model warm reduces average runtime to under 8 seconds for repetitive tasks.
Best Practices for Effective Collaboration
To get the most out of OpenHands, treat it as a collaborative partner rather than a black‑box code generator. Here are some habits that improve outcomes:
- Be explicit – Provide clear constraints (e.g., “use type hints”, “no external dependencies”).
- Iterate quickly – Accept small patches, run tests, then ask for refinements.
- Leverage the sandbox logs – The executor returns stdout/stderr; reviewing them can reveal hidden assumptions.
- Version‑control the AI output – Commit each AI‑generated diff separately, making it easy to revert or cherry‑pick.
Pro tip: Prefix your request with “Write unit tests first” to enforce test‑driven development. OpenHands will scaffold the tests before the implementation, ensuring coverage from day one.
Future Roadmap
The OpenHands community is actively extending the platform. Upcoming features include:
- Multi‑modal support – Incorporate diagrams or UI mockups as additional inputs.
- Interactive debugging – Real‑time breakpoints and variable inspection inside the sandbox.
- Team awareness – A shared memory store that lets multiple agents coordinate on large projects.
Contributions are welcomed via GitHub issues and pull requests. The maintainers maintain a “good first issue” label for newcomers, making it easy to get involved.
Conclusion
OpenHands transforms the abstract promise of AI‑assisted coding into a concrete, extensible tool that can generate, debug, and refactor code autonomously. By marrying a powerful LLM with a secure execution sandbox, it delivers rapid prototyping, intelligent code review, and seamless CI integration—all under an open‑source license that invites community innovation. Whether you’re a solo developer looking to accelerate feature delivery or a large team seeking AI‑augmented productivity, OpenHands offers a flexible foundation that can grow with your needs.