Benchmarking
Performance is numbers, not vibes. pnpm bench runs vitest benchmarks; scripts/check-bench-budget.mjs compares each p99 to bench-budgets.json in CI. Over budget = CI red.
Three steps
- Write a bench in
*.bench.tsusingbench(). - Run it locally:
pnpm bench. - Check budget with
node scripts/check-bench-budget.mjs bench-results.json— over budget? improve the algorithm or adjust the budget.
Command cheat-sheet
pnpm bench # stable gate mode, single worker
pnpm bench:fast # local run using all workers
pnpm bench src/core/workers # subset
pnpm bench --outputJson bench-results.json # JSON output
node scripts/check-bench-budget.mjs bench-results.json # budget guard2
3
4
5
pnpm bench passes --maxWorkers=1 explicitly for CI and local budget gates, avoiding p99 jitter from multiple benchmark files competing for CPU. bench:fast passes --maxWorkers=100% and is useful for quick local smoke runs, but do not use its p99 values to decide whether budgets pass.
CI steps (.github/workflows/ci.ymlcheck job):
- name: Benchmarks
run: pnpm bench --outputJson bench-results.json
- name: Perf budget guard
run: node scripts/check-bench-budget.mjs bench-results.json2
3
4
5
Writing a bench
// src/core/geometry/offsetPolyline.bench.ts
import { bench } from 'vitest';
import { offsetPolyline } from './offsetPolyline';
const points: [number, number][] = Array.from({ length: 100 }, (_, i) => [i, 0]);
bench('offsetPolyline 100 points / 1m offset', () => {
offsetPolyline(points, 1);
});2
3
4
5
6
7
8
9
bench() warms up and measures; Vitest 4 reports mean / p50 / p99.
Stable inputs
Don't Math.random() inside a bench — different inputs each round blow up p99 noise. Fix the input or seed it:
const seed = 12345;
let rng = seed;
function next() {
return (rng = rng * 1664525 + 1013904223) >>> 0;
}2
3
4
5
Budget file (bench-budgets.json)
{
"budgets": {
"offsetPolyline 100 points / 1m offset": { "p99Ms": 0.5 },
"spatial.worker SYNC 1k entities": { "p99Ms": 35 },
"laneJunctionGraph rebuild 500 lanes": { "p99Ms": 12 }
}
}2
3
4
5
6
7
Schema:
- key = bench
name(exact match including spaces / punctuation). p99Ms= upper bound on the 99th percentile (ms).
Why p99, not p100
p100 = max is too jittery; CI flakes randomly. p99 covers the tail without letting clear regressions through.
CI budget flow
Unregistered benches fail closed. A new bench must add a matching budget in the same change, otherwise CI cannot detect regressions.
Setting a budget
First time
- Write the
*.bench.ts. pnpm bench --outputJson bench-results.json.- Read
nameandp99numbers. - Add a line to
scripts/bench-budgets.jsonwith 30% headroom:
"my new bench / 1k items": { "p99Ms": 13 }Don't pin to the observed value
Observed p99 = 10 ms with budget = 10 ms = CI flakes immediately. Leave 30% headroom for runner variance.
Commit
chore(bench): seed budget for offsetPolyline / 1k items
Initial p99 = 9.8 ms on Apple M1; budget = 13 ms (~30 % headroom)
to absorb GitHub runner variance.2
3
4
Updating a budget
Algorithm got faster → tighten
"existing bench": { "p99Ms": 5 } // was 10Commit:
perf(geometry): vectorize offset computation
Bench p99 drops from 9.2 ms to 3.1 ms.
Budget tightened from 10 ms to 5 ms.2
3
4
Tighter is better
A low-water mark catches regressions. Don't keep the budget loose "just in case it gets slower later".
Algorithm has to become slower → relax + explain
perf(workers): switch to dijkstra over a*
Bench p99 rises from 22 ms to 35 ms because the graph contains negative
edge weights now (PNCJunction). Budget raised from 25 ms → 40 ms with
a comment in bench-budgets.json explaining the trade-off.2
3
4
5
Don't sneak budget bumps
"Algorithm unchanged but budget went 10 → 50" is a smell. Reviewers must ask: did you patch a regression away by relaxing the guard? Justify or reject.
Existing bench areas
| Area | Contract |
|---|---|
offset polyline geometry | p99 ceilings for 10 / 100 / 1000 point offsets |
lane junction derivation | full stitch and 1 / 3 lane incremental decoration budgets |
lane topology reconcile | full / single-dirty topology derivation across several sizes |
overlap reconcile | full recompute scales linearly; dirty edit is near-constant |
spatial index syncDirty | single-dirty sync does not grow with whole-map entity count |
interaction geometry | snap, hit-test distance, and polygon validation |
lane boundary brush | boundary-brush drag hit scan and boundaryType normalization |
spatial worker pipeline | sync, cold feature rebuild, delta, and hit-test protocol |
cold/hot/overlay/grid | main-thread source diff/update and preview construction |
entityOps/mapStore | reference cleanup, reparent scans, and store write txns |
worker/IO chunking | main-thread 2k chunk slice / progress loops |
proto pipeline | bridge, bounds, projection, roundtrip, and codecs |
See scripts/bench-budgets.json (source).
Budget file structure
scripts/check-bench-budget.mjs walks the vitest JSON tree:
{ files: [{ groups: [{ benchmarks: [{ name, p99, ... }] }] }] }It collects (name, p99) leaves and compares to budgets. The file can grow new fields (p50Ms, meanMs), but today only p99 is enforced.
Cross-platform variance
CI runs on GitHub ubuntu-latest (VM, ±20% variance). Your budget must:
- Carry 30% headroom.
- For sub-1 ms benches, expand to 50% (noise dominates).
- A regression reproduces across PRs → real; one-off failure → noise, re-run.
Don't disable a flaky bench
If a bench is occasionally flaky but sometimes catches real regressions, adjust the budget, don't delete the bench. Deleting is blindness.
When to bench
| Change | Bench? |
|---|---|
| New geometry algorithm | ✅ |
| New worker pipeline | ✅ |
| Import/export codec changes | ✅ |
| Cold-layer compile changes | ✅ |
| UI component | ❌ (use React Profiler) |
| Docs | ❌ |
Decision rule
"If this code became 10× slower, would the user feel it?" Yes ⇒ bench.
Separation from unit tests
*.bench.ts uses bench(), *.test.ts uses it().
import { bench, expect, it } from 'vitest';
// You can import both, but don't mix them in one file.2
pnpm test does not run benches; pnpm bench does not run tests.
Profiler complements bench
Bench gives numbers. The browser Profiler shows flame graphs.
DevTools Performance:
pnpm devto start.- Performance panel → Record 5s.
- Run the slow path.
- Bottom-Up view → find the most expensive function.
- Anchor it with a bench.
Common hot spots in flame graphs
JSON.parse/structuredCloneon big payloads — use transferables.Array.prototype.pushin tight loops — preallocate.- Spread /
Object.assign— mutate directly inside an immer producer.
Common pitfalls
Bench reports nothing
The bench() name does not match bench-budgets.json exactly — treated as unregistered and failed. Copy the name verbatim (spaces included).
p99 dwarfs p50
Input jitter is too large. Fix the input or seed. Or boost samples:
bench('foo', () => foo(input), { iterations: 10000 });"Fast locally, slow on CI"
GitHub runners are ~50% of M1 / Ryzen workstations. Set budgets against CI, not local.
Bench hangs
A bench taking > 30s = algorithm degeneration or infinite loop. Set vi.setConfig({ testTimeout: 5000 }) and locate.
Source links
scripts/check-bench-budget.mjsscripts/bench-budgets.jsonvite.config.ts— vitest bench config- Vitest bench docs
Advanced
Trend tracking
Upload each CI bench-results.json to an artifact and chart trends. Not yet enabled.
bench diff in PR
Add a before/after table to the PR template:
| Bench | Before | After |
| -------------------------------------- | ------ | ----- |
| offsetPolyline 100 points / 1m | 0.5 | 0.3 |
| spatial.worker SYNC 1k entities | 32 | 23 |2
3
4
One sentence
Perf changes need numbers. Bench is truth, feeling is noise. Get the numbers, then debate trade-offs.