← BlogBenchmark · Week 4 of 13

time-series storage for sensor data — comparing options

picking the wrong TSDB for hardware sensor data costs you twice: once in raw $/GB, once in queries that should have been milliseconds turning into seconds. by the time you notice, you’ve got 100 GB in the wrong shape and migrating is a quarter of work.

most TSDB comparisons online are written for web/observability workloads — http traces, metrics scraped at 15s intervals, structured logs. hardware data has different characteristics, and the standard takes miss them. this post benchmarks four databases on a workload that actually looks like sensor data, and reports the numbers.

by ann schulte~12 min read
The workload

hardware sensor data has four properties that matter:

  • mostly numeric. float64s, no JSON, no strings of significant size. compressors love this.
  • high source cardinality, narrow per-source. thousands of devices, each emitting one to a few metrics. cardinality lives in source_id, not in tag explosion.
  • batched ingest. gateways and pi-class observers collect for a few seconds and write thousands of rows in one shot. raw single-row inserts are not the path.
  • queries are time-windowed. "last 5 minutes for device X," "average across the fleet for the last hour," "count of points stored this billing period." sensor queries are rarely unbounded scans.

the benchmark generates 1,000,000 rows: 1,000 simulated devices each emitting 1,000 timestamps of temperature (float, slow random walk around 22°C). schema is (timestamp, source_id, metric, value) — the smallest fair representation for fleet sensor data. 5,000-row batches, the same shape a production gateway sends.

The contenders

four databases, all open-source, all in their canonical configuration. no tuning beyond what their docs recommend for time-series:

  • clickhouse 24.8. column store, MergeTree engine, sort key (source_id, metric, timestamp), monthly partitions. native protocol via clickhouse-connect.
  • timescaledb 16. postgres extension. hypertable with 1-day chunks, btree index on (source_id, timestamp DESC). ingest via COPY FROM STDIN.
  • influxdb 2.7. time-structured merge tree (TSM) engine. measurement + tags + field. ingest via the python write API in 5,000-row batches.
  • questdb 8.x. column store, partition by day, WAL-enabled. ingest via the official ILP client. (we couldn’t get publishable numbers — see the note below.)

what’s not in the comparison: cassandra (overkill for this workload), prometheus (designed for scraped metrics, not raw sensor data with high cardinality + retention), DuckDB (embedded, not a server). also out of scope: clustering, replication, multi-node, ingest/query contention. everything’s localhost on a single machine.

Benchmark setup

each database runs in its own docker container, one at a time. the benchmark code lives at the tsdb-comparison guide. clone it, run it, change the constants if you want a different scale.

what we measure

  1. 01ingest throughput — 1M rows in 5,000-row batches, total seconds. each DB uses its recommended high-throughput path: clickhouse native protocol via clickhouse-connect, timescaledb COPY FROM STDIN via psycopg2, influxdb 2.x’s batched write API.
  2. 02query latency — five queries representative of what fleet sensor dashboards actually ask. each runs five times after a warm cache; we report median + p95.
  3. 03storage on disk — bytes used by the table after ingest, measured from inside the container so we capture WAL files, indices, and partition metadata, not just the active data file.

the five queries

Q1   single-source time window:    "esp32-0001's last 5 minutes"
Q2   distinct metrics per source:  "what does esp32-0001 report?"
Q3   fleet aggregate:              "average temp across all devices, last 1 min"
Q4   count over range:             "how many points in the last 10 minutes"
Q5   last-N per source:            "last 10 readings for esp32-0001"

these are the shapes a sensor dashboard or alert engine actually hits. they’re also what plexus issues against its own clickhouse instance — so the post benchmarks something close to a real workload, not a synthetic micro.

running it

bash
# full harness + drivers: docs.plexus.company/guides/tsdb-comparison
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# one DB at a time:
docker compose up -d clickhouse
python bench.py --db clickhouse --out results/clickhouse.json
docker compose down -v
# repeat with --db timescaledb / influxdb

what we couldn’t get clean numbers for

we tried QuestDB (8.1.1 and latest) under the same harness. ILP ingest into a WAL-enabled table hit reproducible writer-pool contention — the broker rejecting batches with table busy [reason=telemetry], and the official python ILP client timing out or emitting broken pipe. we don’t have confidence in any number we’d publish, so we cut it from the table. the runner script is in the repo if you want to debug it on different hardware. results below are clickhouse, timescaledb, and influxdb only.

Results

ingest, storage, and median query latency on a 1M-row dataset. machine: M3 MacBook Pro, 8 cores, Docker Desktop 4 GiB allocation, each DB run in isolation.

metricClickHouseTimescaleDBInfluxDB
ingest (rows/sec)669,220212,407168,930
storage on disk4.6 MiB136.0 MiB41.6 MiB
Q1 window (median)4.44 ms1.11 ms5.40 ms
Q2 distinct metrics2.14 ms0.95 ms3.30 ms
Q3 fleet avg9.38 ms12.63 ms28.94 ms
Q4 count over range2.94 ms52.60 ms27.66 ms
Q5 last-N per source1.70 ms0.35 ms2.31 ms

a few things stand out:

  • ClickHouse compresses brutally well. 4.6 MiB for 1M rows of numeric time-series — roughly 5 bytes per row. TimescaleDB at defaults (no compression policy) uses 30× more space. InfluxDB's TSM engine sits in the middle.
  • ClickHouse leads on ingest by 3×. the native protocol + MergeTree's append-friendly design dominate here. TimescaleDB's COPY is also fast in absolute terms (200k+ rows/sec is plenty for most sensor fleets).
  • TimescaleDB wins point-and-narrow queries (Q1, Q2, Q5). the btree index on (source_id, timestamp) makes single-device lookups trivially fast. ClickHouse's sort key approach loses to a real index when the answer is small.
  • ClickHouse wins aggregate queries (Q3, Q4). column scans + late materialization beat row-store + index-or-scan once you're touching most of the rows.
Tradeoffs — when each one wins
  • TimescaleDB if you also need relational joins on device metadata (a devices table with location, model, firmware version, joined onto telemetry), or if your team already runs Postgres and you want one less moving piece. excellent at “give me this device’s recent N points.” costs disk space without a compression policy.
  • InfluxDB if you want a TSDB with a strong opinion baked in — buckets, retention policies, downsampling tasks all built-in via Flux. great as a self-contained metrics store. tag explosion is still its enemy at high cardinality (thousands+ unique tag combos).
  • ClickHouse if your queries are aggregation-heavy and fleet-wide, or if storage cost matters at scale. dense numeric data compresses spectacularly. retention via TTL clauses on the table. the SQL surface is approachable. less obvious for point-and-narrow lookups against a single device — sort keys aren’t free indices.
  • QuestDB we don’t have a confident take from this benchmark. community reports suggest it’s strong on raw ingest throughput; our setup hit writer contention we couldn’t shake. worth its own post.

what isn’t a tradeoff: at 1M rows, all three databases that finished the benchmark answer every query in under 60ms p95 — so unless you’re at 100× this scale or have queries we didn’t measure, the answer is “any of them is fine; pick on operational fit.”

What we ended up using at plexus, honestly

ClickHouse. the reasons that matter for our specific workload — high-cardinality source_id with thousands of devices, mostly numeric telemetry, fleet-wide aggregations on dashboards, retention that maps cleanly to a TTL clause, and SQL that anyone who’s seen Postgres can read — all line up. it isn’t a universally right pick; TimescaleDB would be a better default if our reads were dominated by “this one device, last hour” lookups instead of fleet aggregates. the benchmark above shouldn’t tell you what to use. it should tell you how to find out.

Get the code

clone, run, file an issue if a number changes by >10%.

bench.py, the four drivers, docker compose, and the synthetic generator live in the docs.

Time-series storage for sensor data — comparing options | Plexus