Typesense: Open Source Algolia Alternative
Typesense is a fast, typo‑tolerant search engine that you can run on your own infrastructure. It offers an API that feels familiar to anyone who has used Algolia, but without the per‑search cost and vendor lock‑in. In this article we’ll walk through the core concepts, spin up a local instance, index some data, and explore the features that make Typesense a compelling open‑source alternative.
What Sets Typesense Apart?
At its core, Typesense is a **document‑oriented** search engine built in C++ for performance and Rust for safety. It stores data in memory for lightning‑fast reads, while persisting to disk to survive restarts. The API is JSON‑over‑HTTP, mirroring Algolia’s endpoints, which means you can often switch with minimal code changes.
Key differentiators include:
- Open source license (MIT) – you can self‑host, modify, or contribute.
- Predictable pricing – run it on any cloud or on‑premise server without per‑search fees.
- Built‑in typo tolerance – out of the box fuzzy matching with configurable thresholds.
- Faceting & filtering – powerful, expressive filters that work on any field.
- Multi‑tenant collections – isolate data for different applications or customers.
Getting Started: Installing Typesense
The quickest way to try Typesense is with Docker. The official image bundles everything you need, including a health‑check endpoint.
docker run -d \
-p 8108:8108 \
-v $(pwd)/data:/data \
--name typesense-server \
typesense/typesense:0.25.1 \
--data-dir /data \
--api-key=xyz123
Once the container is running, you can verify it with a simple curl request:
curl -H "X-TYPESENSE-API-KEY: xyz123" http://localhost:8108/health
The response {"ok":true} confirms that the server is ready to accept requests.
Creating a Collection
A collection in Typesense is analogous to an index in Algolia. Define the schema to tell Typesense how to store and search each field.
import typesense
client = typesense.Client({
'nodes': [{
'host': 'localhost',
'port': '8108',
'protocol': 'http'
}],
'api_key': 'xyz123',
'connection_timeout_seconds': 2
})
schema = {
'name': 'books',
'fields': [
{'name': 'id', 'type': 'string'},
{'name': 'title', 'type': 'string'},
{'name': 'author', 'type': 'string'},
{'name': 'genres', 'type': 'string[]', 'facet': True},
{'name': 'rating', 'type': 'float', 'facet': True},
{'name': 'published_year', 'type': 'int32', 'facet': True},
{'name': 'description', 'type': 'string'}
],
'default_sorting_field': 'rating'
}
client.collections.create(schema)
Notice the facet flag – it tells Typesense that the field can be used for faceted navigation, a feature we’ll explore later.
Indexing Documents
With the collection ready, you can bulk‑import data. Typesense expects NDJSON (newline‑delimited JSON) for efficient ingestion.
books = [
{
"id": "1",
"title": "The Pragmatic Programmer",
"author": "Andrew Hunt",
"genres": ["programming", "software engineering"],
"rating": 4.7,
"published_year": 1999,
"description": "A practical guide to software craftsmanship."
},
{
"id": "2",
"title": "Clean Code",
"author": "Robert C. Martin",
"genres": ["programming", "best practices"],
"rating": 4.5,
"published_year": 2008,
"description": "Even bad code can be cleaned up."
},
# Add more books as needed…
]
# Convert to NDJSON
ndjson = "\n".join([json.dumps(b) for b in books])
client.collections['books'].documents.import_(ndjson)
After the import finishes, you can verify the count:
print(client.collections['books'].documents['1'].retrieve())
Basic Search Queries
Searching is as simple as sending a GET request to the /search endpoint. Let’s find books that mention “code”.
search_params = {
'q' : 'code',
'query_by' : 'title,description',
'sort_by' : 'rating:desc',
'page' : 1,
'per_page' : 5
}
results = client.collections['books'].documents.search(search_params)
for hit in results['hits']:
print(hit['document']['title'], hit['document']['rating'])
Typesense automatically applies typo tolerance, so a query for “pragmatic progrmmr” will still surface “The Pragmatic Programmer”.
Filtering and Faceting
Filters let you narrow results based on field values, while facets return aggregated counts for those fields. Here’s a query that returns only books published after 2000, and also shows how many books belong to each genre.
search_params = {
'q' : '',
'query_by' : 'title',
'filter_by' : 'published_year:>2000',
'facet_by' : 'genres',
'max_facet_values' : 10
}
results = client.collections['books'].documents.search(search_params)
print("Facets:", results['facet_counts'])
The facet_counts array contains objects like {"field_name":"genres","counts":[{"value":"programming","count":12},...]}, which you can render as a sidebar in a UI.
Advanced Features
Synonyms
Synonyms improve recall by mapping multiple terms to a single concept. Typesense supports three synonym types: one‑to‑one, one‑to‑many, and alternatives. Below is a one‑to‑many example that treats “js” and “javascript” as equivalent.
synonym = {
"type": "one_to_many",
"synonyms": ["js", "javascript"]
}
client.collections['books'].synonyms.upsert('js_synonym', synonym)
After adding the synonym, a search for “js” will also match documents containing “javascript”.
Geo‑Search
For location‑aware applications, Typesense can index latitude/longitude pairs and perform radius searches. Define a geolocation field in your schema:
{'name': 'location', 'type': 'geopoint', 'facet': False}
Then query for points within a 10‑kilometer radius of a given coordinate:
search_params = {
'q' : '',
'query_by' : 'title',
'filter_by' : '_geoRadius(40.7128, -74.0060, 10)', # NYC center
'sort_by' : '_geoDistance(40.7128, -74.0060):asc'
}
results = client.collections['places'].documents.search(search_params)
Custom Ranking Rules
Beyond the default rating sort, you can combine multiple criteria using the sort_by parameter. For an e‑commerce catalog you might prioritize “in_stock” then “price”.
search_params = {
'q' : 'sneakers',
'query_by' : 'name,description',
'sort_by' : 'in_stock:desc,price:asc'
}
Because sorting is performed on indexed fields, it remains fast even on large datasets.
Scaling Typesense
While a single node can handle tens of thousands of queries per second, production workloads often require high availability. Typesense offers a cluster mode where a leader node handles writes and multiple replicas serve reads.
- Leader‑replica architecture – writes are serialized on the leader, then replicated asynchronously.
- Automatic failover – if the leader crashes, a replica is promoted.
- Horizontal scaling – add more replicas to increase read throughput.
Deploying a cluster is straightforward with Docker Compose or Kubernetes. Below is a minimal docker-compose.yml that spins up a three‑node cluster.
version: "3.8"
services:
typesense-0:
image: typesense/typesense:0.25.1
command: ["--data-dir", "/data", "--api-key", "xyz123", "--nodes", "typesense-0:8108,typesense-1:8108,typesense-2:8108"]
ports:
- "8108:8108"
volumes:
- ./data0:/data
typesense-1:
image: typesense/typesense:0.25.1
command: ["--data-dir", "/data", "--api-key", "xyz123", "--nodes", "typesense-0:8108,typesense-1:8108,typesense-2:8108"]
volumes:
- ./data1:/data
typesense-2:
image: typesense/typesense:0.25.1
command: ["--data-dir", "/data", "--api-key", "xyz123", "--nodes", "typesense-0:8108,typesense-1:8108,typesense-2:8108"]
volumes:
- ./data2:/data
Each node knows about the others via the --nodes flag, and the cluster automatically elects a leader.
Real‑World Use Cases
e‑Commerce storefronts benefit from instant product search, faceted navigation, and relevance tuning. A retailer can store product titles, categories, price, stock status, and use the sort_by clause to surface in‑stock items first.
Documentation portals require fuzzy matching and synonym handling to surface the right article even when users misspell technical terms. Typesense’s typo tolerance and synonym engine reduce the “no results” frustration common in static search solutions.
Job boards often need geo‑search combined with filters like salary range, remote‑friendly flag, and required skills. By indexing location as a geopoint and skills as an array field, you can deliver highly targeted listings without a separate GIS service.
Comparing Typesense with Algolia
- Cost – Algolia charges per search operation and per indexed record; Typesense is free to run on your own hardware.
- Latency – Both claim sub‑50 ms responses, but Typesense can achieve lower latency when co‑located with your application.
- Feature parity – Algolia offers advanced A/B testing and analytics out of the box. Typesense provides core search, faceting, and synonyms, while analytics must be built separately.
- Vendor lock‑in – Switching away from Algolia requires data export and re‑indexing. Typesense’s open API and self‑hosted nature make migrations painless.
In short, if you need a fully managed solution with premium UI widgets, Algolia remains a solid choice. If you prioritize cost control, data sovereignty, or want to embed search directly into a private network, Typesense is the clear winner.
Pro tip: Use the
num_typosparameter to fine‑tune typo tolerance per field. For brand names, setnum_typos=0to avoid false positives, while keeping the default (1) for free‑form text.
Best Practices for Production Deployments
- Secure the API key – Treat the master key as a secret. For client‑side access, generate a scoped search‑only key with limited permissions.
- Back up data regularly – Typesense stores data on disk, so schedule snapshots or use volume snapshots in your cloud provider.
- Monitor latency and queue depth – Export the
/metricsendpoint to Prometheus and set alerts for spikes. - Batch imports – Use the bulk import API for initial loads; avoid per‑document HTTP calls for large datasets.
- Plan for schema evolution – Adding a new field requires a collection update, which is a non‑blocking operation.
Putting It All Together: A Mini Search Service
Below is a concise Flask app that exposes a /search endpoint backed by Typesense. The example demonstrates request validation, scoped API key generation, and returning results in a UI‑friendly shape.
from flask import Flask, request, jsonify
import typesense, os, uuid
app = Flask(__name__)
client = typesense.Client({
'nodes': [{'host': 'localhost', 'port': '8108', 'protocol': 'http'}],
'api_key': os.getenv('TYPESENSE_MASTER_KEY', 'xyz123')
})
# Generate a short‑lived search‑only key (valid for 5 minutes)
def generate_search_key():
return client.keys.create({
'description': f'Search key for session {uuid.uuid4()}',
'actions': ['documents:search'],
'collections': ['books'],
'expires_at': int(time.time()) + 300
})['value']
@app.route('/search')
def search():
query = request.args.get('q', '')
page = int(request.args.get('page', 1))
per_page = int(request.args.get('per_page', 10))
params = {
'q' : query,
'query_by' : 'title,author,description',
'page' : page,
'per_page' : per_page,
'facet_by' : 'genres',
'max_facet_values' : 5
}
results = client.collections['books'].documents.search(params)
return jsonify({
'hits' : [h['document'] for h in results['hits']],
'facets' : results.get('facet_counts', []),
'search_key' : generate_search_key()
})
if __name__ == '__main__':
app.run(debug=True)
The endpoint returns a JSON payload that includes the matching books, facet data for UI filters, and a short‑lived search key that client‑side code can use for subsequent direct Typesense queries.
Common Pitfalls and How to Avoid Them
- Over‑indexing large text fields – Storing full article bodies can bloat memory. Use a separate collection for heavy content and retrieve it only when needed.
- Neglecting facet limits – By default Typesense returns up to 100 facet values. For high‑cardinality fields, set
max_f