Performance
Quick legend
Section titled “Quick legend”Bear in mind that even Slow here can still be 2-4x faster than postMessage (depending on payload and workload).
- Best: near “header-only” cost, best path
- Fast: still very cheap per call
- Good: fine for real workloads
- Fair: watch frequency
- Slow: avoid in hot loops, consider alternatives
Rating thresholds (easy to tweak later)
Section titled “Rating thresholds (easy to tweak later)”These tiers are intentionally simple. If you re-run benchmarks on a new CPU/runtime, edit these numbers and re-label rows as needed.
For a single call (1 value):
- Best: < 1 us
- Fast: < 2 us
- Good: < 4 us
- Fair: < 9 us
- Slow: > 9 us
For a batch (100 values):
- Best: < 50 us
- Fast: < 100 us
- Good: < 200 us
- Fair: < 450 us
- Slow: > 450 us
Benchmark environment
Section titled “Benchmark environment”- clk: ~3.86 GHz
- cpu: Apple M3 Ultra
- runtime: node 24.12.0 (arm64-darwin)
Mental model (what makes things fast)
Section titled “Mental model (what makes things fast)”1) Header-only / “no payload” Values fit into the call header: tiny encode/decode cost.
2) Static payload Reuses part of the Header plus a small buffer, only for small playloads.
3) Dynamic payload (allocator path) Needs allocation, copying, and bookkeeping. Still fast, but you’ll feel it in hot loops.
Single call categories (1 value)
Section titled “Single call categories (1 value)”| Case | Batch | Tier |
|---|---|---|
Primitives: boolean, undefined, null | (1) | Best |
Numbers: number | (1) | Best |
Time/IDs: Date | (1) | Best |
Strings: small string | (1) | Best |
Symbols: Symbol.for | (1) | Fast |
BigInt: small bigint | (1) | Best |
BigInt: large bigint | (1) | Fast |
| Binary: Typed Arrays | (1) | Best |
Views: DataView | (1) | Good |
Structured: JSON object | (1) | Good |
Structured: JSON array | (1) | Good |
Errors: Error | (1) | Slow |
Batch categories (100 values)
Section titled “Batch categories (100 values)”Batching amortizes overhead or if you are doing repeated calls.
| Case | Batch | Tier |
|---|---|---|
Primitives: boolean, undefined, null | (100) | Best |
Numbers: number | (100) | Best |
Strings: small string | (100) | Best |
BigInt: small bigint | (100) | Best |
BigInt: large bigint | (100) | Fast |
| Binary: Typed Arrays | (100) | Best |
Structured: JSON object, JSON array | (100) | Good |
Errors: Error | (100) | Slow |
Time/IDs: Date | (100) | Best |
Symbols: Symbol.for | (100) | Best |
Configuration effects on performance
Section titled “Configuration effects on performance”Thread count
Section titled “Thread count”More threads means more lanes for the balancer to distribute work across, but
each thread has a fixed memory cost (payload buffers, shared lock regions,
abort signal pool). Returns diminish past the number of physical cores. For
compute-only workloads, threads: os.availableParallelism() - 1 is a
reasonable starting point.
Inliner
Section titled “Inliner”The inliner skips encode/decode entirely, so header-only payloads on the
inline lane are effectively free. For tiny math tasks, adding
inliner: { position: "last", batchSize: 64 } can improve throughput
noticeably. See Inliner guide.
Permissions
Section titled “Permissions”Enabling permission: "strict" adds startup cost per worker (flag generation,
lock file resolution) but has no measurable per-call overhead once workers are
running. The cost is one-time and small.
Payload sizing
Section titled “Payload sizing”payload.payloadInitialBytes, payload.payloadMaxByteLength, and
payload.maxPayloadBytes control the shared buffer each worker allocates.
Larger initial buffers avoid runtime growth at the cost of upfront memory.
If your payloads are consistently small (primitives, short strings), the
defaults (4 MiB initial, 64 MiB max length, 8 MiB hard dynamic cap) are
usually enough.
Batching
Section titled “Batching”Batching many calls amortizes header and dispatch overhead. Even types rated
Slow in the single-call table become
acceptable when batched. If you’re calling the same task in a loop, fire all
calls concurrently with Promise.all rather than awaiting each one
sequentially.
Promise inputs
Section titled “Promise inputs”call.*() can also accept Promise<supported> inputs, not just already
resolved values. In HTTP handlers this lets you pass the body read directly:
app.post("/jwt", async (c) => { const responseJson = await handlers.call.issueJwt(c.req.arrayBuffer());
return c.body(responseJson ?? "Bad request", responseJson ? 200 : 400, { "content-type": "application/json; charset=utf-8", });});This does not help because a Promise is somehow faster on its own. The
win is that the main thread no longer needs to stop and materialize the payload
before handing it to Knitting.
Why it helps in the Hono arrayBuffer() case:
c.req.arrayBuffer()already returns a promise, so forwarding it directly avoids one extraawaitstep in the route handler.- If the worker task accepts
ArrayBufferand does UTF-8 decode / JSON parsing inside the task, that decode/parse work moves off the request thread. ArrayBufferstays on the binary fast path, which is usually cheaper than first converting tostringor building a JSON object on the host.
The same pattern also works when the task expects an Envelope. You can shape
the value with .then(...) and still avoid awaiting the body first:
import { Envelope } from "@vixeny/knitting";
app.post("/upload", async (c) => { const result = await handlers.call.storeUpload( c.req.arrayBuffer().then( (body) => new Envelope( { contentType: c.req.header("content-type") ?? "application/octet-stream" }, // or: c.req.header("content-type") ?? "application/octet-stream" body, ), ), );
return c.json(result);});Here the host does not await the request body up front. The envelope is created
only when the promise resolves, and Knitting receives the resolved
Envelope<header, payload> value.
This is most useful when the route is basically a transport layer and the worker is supposed to own parsing anyway, like SSR or JWT issuance. If you need to inspect the body on the main thread before dispatch, await it locally and validate there.