picking the wrong TSDB for hardware sensor data costs you twice: once in raw $/GB, once in queries that should have been milliseconds turning into seconds. by the time you notice, you’ve got 100 GB in the wrong shape and migrating is a quarter of work.
most TSDB comparisons online are written for web/observability workloads — http traces, metrics scraped at 15s intervals, structured logs. hardware data has different characteristics, and the standard takes miss them. this post benchmarks four databases on a workload that actually looks like sensor data, and reports the numbers.
hardware sensor data has four properties that matter:
the benchmark generates 1,000,000 rows: 1,000 simulated devices each emitting 1,000 timestamps of temperature (float, slow random walk around 22°C). schema is (timestamp, source_id, metric, value) — the smallest fair representation for fleet sensor data. 5,000-row batches, the same shape a production gateway sends.
four databases, all open-source, all in their canonical configuration. no tuning beyond what their docs recommend for time-series:
(source_id, metric, timestamp), monthly partitions. native protocol via clickhouse-connect.(source_id, timestamp DESC). ingest via COPY FROM STDIN.what’s not in the comparison: cassandra (overkill for this workload), prometheus (designed for scraped metrics, not raw sensor data with high cardinality + retention), DuckDB (embedded, not a server). also out of scope: clustering, replication, multi-node, ingest/query contention. everything’s localhost on a single machine.
each database runs in its own docker container, one at a time. the benchmark code lives at the tsdb-comparison guide. clone it, run it, change the constants if you want a different scale.
clickhouse-connect, timescaledb COPY FROM STDIN via psycopg2, influxdb 2.x’s batched write API.Q1 single-source time window: "esp32-0001's last 5 minutes"
Q2 distinct metrics per source: "what does esp32-0001 report?"
Q3 fleet aggregate: "average temp across all devices, last 1 min"
Q4 count over range: "how many points in the last 10 minutes"
Q5 last-N per source: "last 10 readings for esp32-0001"these are the shapes a sensor dashboard or alert engine actually hits. they’re also what plexus issues against its own clickhouse instance — so the post benchmarks something close to a real workload, not a synthetic micro.
# full harness + drivers: docs.plexus.company/guides/tsdb-comparison
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# one DB at a time:
docker compose up -d clickhouse
python bench.py --db clickhouse --out results/clickhouse.json
docker compose down -v
# repeat with --db timescaledb / influxdbwe tried QuestDB (8.1.1 and latest) under the same harness. ILP ingest into a WAL-enabled table hit reproducible writer-pool contention — the broker rejecting batches with table busy [reason=telemetry], and the official python ILP client timing out or emitting broken pipe. we don’t have confidence in any number we’d publish, so we cut it from the table. the runner script is in the repo if you want to debug it on different hardware. results below are clickhouse, timescaledb, and influxdb only.
ingest, storage, and median query latency on a 1M-row dataset. machine: M3 MacBook Pro, 8 cores, Docker Desktop 4 GiB allocation, each DB run in isolation.
| metric | ClickHouse | TimescaleDB | InfluxDB |
|---|---|---|---|
| ingest (rows/sec) | 669,220 | 212,407 | 168,930 |
| storage on disk | 4.6 MiB | 136.0 MiB | 41.6 MiB |
| Q1 window (median) | 4.44 ms | 1.11 ms | 5.40 ms |
| Q2 distinct metrics | 2.14 ms | 0.95 ms | 3.30 ms |
| Q3 fleet avg | 9.38 ms | 12.63 ms | 28.94 ms |
| Q4 count over range | 2.94 ms | 52.60 ms | 27.66 ms |
| Q5 last-N per source | 1.70 ms | 0.35 ms | 2.31 ms |
a few things stand out:
devices table with location, model, firmware version, joined onto telemetry), or if your team already runs Postgres and you want one less moving piece. excellent at “give me this device’s recent N points.” costs disk space without a compression policy.TTL clauses on the table. the SQL surface is approachable. less obvious for point-and-narrow lookups against a single device — sort keys aren’t free indices.what isn’t a tradeoff: at 1M rows, all three databases that finished the benchmark answer every query in under 60ms p95 — so unless you’re at 100× this scale or have queries we didn’t measure, the answer is “any of them is fine; pick on operational fit.”
ClickHouse. the reasons that matter for our specific workload — high-cardinality source_id with thousands of devices, mostly numeric telemetry, fleet-wide aggregations on dashboards, retention that maps cleanly to a TTL clause, and SQL that anyone who’s seen Postgres can read — all line up. it isn’t a universally right pick; TimescaleDB would be a better default if our reads were dominated by “this one device, last hour” lookups instead of fleet aggregates. the benchmark above shouldn’t tell you what to use. it should tell you how to find out.
bench.py, the four drivers, docker compose, and the synthetic generator live in the docs.