QuestDB: Fastest Time-Series Database
QuestDB has been making waves in the time‑series world by promising sub‑millisecond query latency while keeping the developer experience simple. If you’ve ever wrestled with write‑heavy workloads or struggled to get real‑time insights from sensor data, you’ll find QuestDB’s design refreshingly pragmatic. In this post we’ll dive deep into why QuestDB is often called the “fastest time‑series database,” walk through practical Python integrations, and explore real‑world scenarios where it truly shines.
What Makes QuestDB Different?
At its core, QuestDB is a column‑oriented relational database built specifically for time‑series workloads. Unlike traditional row stores, it stores each column in contiguous memory, which means scans over a single field (like a timestamp) are lightning fast. The engine also leverages vectorized SIMD instructions and lock‑free data structures to eliminate bottlenecks during high‑frequency ingest.
Another key differentiator is the **Influx Line Protocol (ILP)** support. QuestDB can consume data in the same format used by InfluxDB, letting you switch or hybridize without rewriting your data pipelines. Coupled with a PostgreSQL‑compatible SQL dialect, you get the best of both worlds: familiar query syntax and ultra‑low latency.
Zero‑Copy Architecture
QuestDB stores data in memory‑mapped files, allowing the operating system to handle paging directly. This “zero‑copy” approach reduces the number of context switches and eliminates unnecessary data copies between user space and kernel space. The result is that reads can be served directly from the page cache, often bypassing disk I/O altogether.
Because the data layout mirrors the columnar format, compression is applied per column, achieving high storage efficiency without sacrificing speed. Compression is transparent—queries read compressed blocks and decompress on the fly using SIMD‑accelerated routines.
Getting Started: Installing QuestDB
QuestDB can run as a Docker container, a native binary, or even on Kubernetes. For most developers, the Docker route is the quickest way to spin up a sandbox environment.
docker run -p 9000:9000 -p 8812:8812 questdb/questdb:latest
The web console becomes available at http://localhost:9000, while the PostgreSQL wire protocol listens on port 8812. Once the container is up, you can start sending data via ILP or using the standard SQL interface.
Python Client Setup
QuestDB provides a lightweight Python client that talks to the ILP endpoint over UDP. Install it via pip:
pip install questdb
With the client ready, you can begin streaming sensor readings in just a few lines of code.
Ingesting Data – A Real‑Time IoT Example
Imagine you have a fleet of temperature sensors reporting every second. Each reading includes a device ID, temperature, and a timestamp. QuestDB’s ILP endpoint can ingest millions of rows per second, making it perfect for this use case.
import time
import random
from questdb import Sender, Table
# Define the target table and its schema
TABLE_NAME = "sensor_data"
TABLE = Table(name=TABLE_NAME, columns=[
("device_id", "symbol"),
("temperature", "float"),
("ts", "timestamp")
])
sender = Sender(host="localhost", port=9009) # ILP UDP port
sender.table(TABLE)
def generate_reading():
device = f"device_{random.randint(1, 50)}"
temp = round(random.uniform(15.0, 30.0), 2)
timestamp = int(time.time() * 1_000_000) # microseconds
return device, temp, timestamp
for _ in range(10_000):
device_id, temperature, ts = generate_reading()
sender.row(TABLE_NAME).symbol("device_id", device_id)\
.float("temperature", temperature)\
.timestamp("ts", ts)\
.at_now()
time.sleep(0.001) # simulate ~1k rows/sec
sender.flush()
print("Ingestion complete.")
The code creates a table called sensor_data, streams 10,000 rows at roughly 1 kHz, and automatically handles schema creation on the first write. Because ILP uses UDP, there’s minimal overhead, and QuestDB will batch writes internally for maximum throughput.
Pro tip: When ingesting at extreme rates (>1 M rows/sec), consider increasing the UDP socket buffer (`net.core.rmem_max`) on the host OS to avoid packet loss.
Querying Time‑Series Data with Python
After data lands in QuestDB, you’ll likely want to run analytical queries. The same Python package offers a simple HTTP API for executing SQL.
import requests
import pandas as pd
SQL_ENDPOINT = "http://localhost:9000/exec"
QUERY = """
SELECT device_id,
AVG(temperature) AS avg_temp,
max(ts) - min(ts) AS duration_us
FROM sensor_data
WHERE ts BETWEEN now() - 1h AND now()
GROUP BY device_id
ORDER BY avg_temp DESC
LIMIT 10;
"""
response = requests.post(SQL_ENDPOINT, data=QUERY)
data = response.json()
df = pd.DataFrame(data['dataset'], columns=data['columnNames'])
print(df)
This query computes the average temperature per device over the last hour, orders the results, and returns the top 10 hottest devices. The response is JSON, which we easily load into a pandas DataFrame for downstream analysis or visualization.
Using the QuestDB CLI
If you prefer an interactive shell, QuestDB ships with a built‑in CLI (`questdb-cli`) that connects over the PostgreSQL wire protocol. It’s handy for ad‑hoc debugging:
questdb-cli -h localhost -p 8812 -U admin -W questdb
questdb> SELECT count(*) FROM sensor_data;
questdb> \q
Real‑World Use Cases
Financial Tick Data: High‑frequency trading firms need to store millions of price updates per second and query them with sub‑millisecond latency. QuestDB’s columnar storage and vectorized aggregations make it ideal for calculating moving averages, VWAP, and other indicators on the fly.
Infrastructure Monitoring: Metrics like CPU usage, network throughput, and error rates are naturally time‑indexed. QuestDB can ingest logs from tools like Prometheus or Fluent Bit via ILP and serve dashboards that refresh in real time.
IoT Fleet Management: As demonstrated earlier, sensor streams from vehicles, wearables, or smart cities can be persisted with minimal schema management. The built‑in SQL functions (e.g., `date_trunc`, `first`, `last`) simplify pattern detection such as “detect devices that have been idle for more than 10 minutes.”
Performance Benchmarks
QuestDB’s developers frequently publish benchmarks comparing it against InfluxDB, TimescaleDB, and ClickHouse. In a typical 10 M row insert test on a 4‑core Intel i7, QuestDB achieved ~2.3 M rows/sec using ILP, while InfluxDB peaked at ~1.1 M rows/sec under the same hardware.
On the query side, a 1‑hour window aggregation over 100 M rows returned in under 150 ms on QuestDB, versus 620 ms on TimescaleDB. The speed gains stem from two factors: (1) the zero‑copy, memory‑mapped storage engine, and (2) SIMD‑accelerated aggregate functions that process 8‑16 values per CPU cycle.
Pro tip: For best read performance, align your queries with the table’s partitioning strategy (e.g., daily partitions). QuestDB can prune entire partitions, turning a potential full‑scan into a micro‑second lookup.
Advanced Features
Partitioning Strategies
QuestDB supports three partitioning modes: NONE, DAY, and HOUR. When you create a table, you can specify the desired mode:
CREATE TABLE trades (
symbol SYMBOL,
price DOUBLE,
size LONG,
ts TIMESTAMP
) TIMESTAMP(ts) PARTITION BY DAY;
With daily partitions, QuestDB stores each day’s data in a separate file segment. Queries that filter on the timestamp can skip irrelevant partitions, dramatically reducing I/O.
Symbol Columns for High‑Cardinality Data
Symbol columns are QuestDB’s answer to dictionary encoding. When you insert a new string into a symbol column, QuestDB stores it once in a global dictionary and references it by an integer ID in the data file. This reduces storage and speeds up equality filters.
In the IoT example, device_id is a symbol column, allowing you to filter on a specific device without scanning the entire table.
Built‑In Time‑Series Functions
QuestDB ships with a rich set of functions tailored for time‑series analysis: first(), last(), sampleby(), and interp(). For instance, to down‑sample a minute‑level series into 5‑minute buckets, you can write:
SELECT
timestamp_bucket('5m', ts) AS bucket,
avg(temperature) AS avg_temp
FROM sensor_data
WHERE ts BETWEEN now() - 24h AND now()
GROUP BY bucket
ORDER BY bucket;
The timestamp_bucket function aligns timestamps to the nearest bucket boundary, making it trivial to generate rolling aggregates for dashboards.
Integrations with the Data Ecosystem
QuestDB can act as both a source and a sink in modern data pipelines. Its PostgreSQL wire protocol lets you connect BI tools like Metabase, Superset, or Tableau directly. Meanwhile, the ILP endpoint can be fed by Kafka Connect, Logstash, or custom Go/Java producers.
For batch processing, you can export data to Parquet using the COPY command:
COPY (SELECT * FROM sensor_data WHERE ts > now() - 7d)
TO '/tmp/sensor_last_week.parquet' (FORMAT 'parquet');
This enables downstream analytics in Spark or Presto without additional ETL steps.
Streaming to Grafana
Grafana’s native QuestDB data source (available as a plugin) allows you to build real‑time dashboards with minimal configuration. Simply point Grafana to the QuestDB HTTP endpoint, write a query like the one shown earlier, and set the refresh interval to 1 s for live monitoring.
Pro Tips for Production Deployments
Tip 1 – Reserve Huge Pages: Enabling huge pages on Linux (e.g., sysctl -w vm.nr_hugepages=1024) reduces TLB misses for the memory‑mapped column files, giving you a noticeable latency boost under heavy load.
Tip 2 – Use Dedicated Disks for WAL: QuestDB’s write‑ahead log (WAL) can be placed on a separate SSD to isolate write amplification from the main data files, improving durability without compromising read speed.
Tip 3 – Monitor Back‑Pressure: When ingesting via ILP, watch the UDP socket’s drop count (`netstat -su`). If you see drops, either increase the socket buffer or scale out by adding more QuestDB nodes behind a load balancer.
Scaling QuestDB Horizontally
Out of the box, QuestDB is a single‑node database. However, you can achieve horizontal scalability using sharding at the application level. Partition your time series by a logical key (e.g., device region) and route writes to different QuestDB instances. A lightweight proxy (like Nginx or HAProxy) can then aggregate query results across shards.
Future releases are expected to include native clustering, but even today, the combination of fast single‑node performance and simple sharding makes QuestDB a pragmatic choice for many real‑time workloads.
Conclusion
QuestDB delivers on its promise of being the fastest time‑series database by marrying a columnar, memory‑mapped engine with modern SIMD optimizations and an intuitive SQL interface. Whether you’re handling IoT telemetry, financial tick data, or infrastructure metrics, QuestDB’s low‑latency ingest and sub‑millisecond query response can dramatically reduce the time from data arrival to insight.
With straightforward Docker deployment, native ILP support, and seamless Python integration, getting started is frictionless. The advanced features—symbol columns, flexible partitioning, and built‑in time‑series functions—provide the scalability and analytical power needed for production workloads.
Give QuestDB a spin in your next real‑time project; you’ll likely be surprised how little code is required to achieve blazing‑fast performance.