Performance
Quick legend
Section titled “Quick legend”Bear in mind that even Slow here can still be 2-4x faster than postMessage (depending on payload and workload).
- Best: near “header-only” cost, best path
- Fast: still very cheap per call
- Good: fine for real workloads
- Fair: watch frequency
- Slow: avoid in hot loops, consider alternatives
Rating thresholds (per call, easy to tweak later)
Section titled “Rating thresholds (per call, easy to tweak later)”These tiers are intentionally simple. If you re-run benchmarks on a new CPU/runtime, edit these numbers and re-label rows as needed.
- Best: < 1 us
- Fast: < 2 us
- Good: < 4 us
- Fair: < 9 us
- Slow: > 9 us
Benchmark environment
Section titled “Benchmark environment”- clk: ~3.86 GHz
- cpu: Apple M3 Ultra
- runtime: node 24.12.0 (arm64-darwin)
Mental model (what makes things fast)
Section titled “Mental model (what makes things fast)”1) Header-only / “no payload” Values fit into the call header: tiny encode/decode cost.
2) Static payload Reuses part of the header plus a small buffer, only for small payloads.
3) Dynamic payload (allocator path) Needs allocation, copying, and bookkeeping. Still fast, but you’ll feel it in hot loops.
4) Pointer teleportation (zero-copy)
SharedArrayBuffer and ProcessSharedBuffer are never copied — only a
pointer/handle crosses the boundary and both sides map the same bytes. Size
stops mattering: a 1 KiB and a 64 MiB SharedArrayBuffer cost the same to pass.
BufferReference (knitting/unsafe) is the thread-only move variant: it
detaches the source and hands the bytes over without a copy.
Single call categories (1 value)
Section titled “Single call categories (1 value)”| Case | Tier |
|---|---|
Primitives: boolean, undefined, null | Best |
Numbers: number | Best |
Time/IDs: Date | Best |
Strings: small string | Best |
Symbols: Symbol.for | Fast |
BigInt: small bigint | Best |
BigInt: large bigint | Fast |
| Binary: typed arrays | Best |
Views: DataView | Good |
| Structured: JSON object | Good |
| Structured: JSON array | Good |
Errors: Error | Slow |
Configuration effects on performance
Section titled “Configuration effects on performance”Thread count
Section titled “Thread count”More threads means more lanes for the balancer to distribute work across, but
each thread has a fixed memory cost (payload buffers, shared lock regions,
abort signal pool). Returns diminish past the number of physical cores. For
compute-only workloads, threads: os.availableParallelism() - 1 is a
reasonable starting point.
Inliner
Section titled “Inliner”The inliner skips encode/decode entirely, so header-only payloads on the
inline lane are effectively free. For tiny math tasks, adding
inliner: { position: "last", batchSize: 64 } can improve throughput
noticeably. See Inliner guide.
Permissions
Section titled “Permissions”Enabling permission: "strict" adds startup cost per worker (flag generation,
lock file resolution) but has no measurable per-call overhead once workers are
running. The cost is one-time and small.
Payload sizing
Section titled “Payload sizing”payload.payloadInitialBytes, payload.payloadMaxByteLength, and
payload.maxPayloadBytes control the shared buffer each worker allocates.
Larger initial buffers avoid runtime growth at the cost of upfront memory.
If your payloads are consistently small (primitives, short strings), the
defaults (4 MiB initial, 64 MiB max length, 8 MiB hard dynamic cap) are
usually enough.
Threads vs processes
Section titled “Threads vs processes”The two worker runtimes isolate memory differently, so they expose different zero-copy tools:
| Runtime | Isolation | Zero-copy tools |
|---|---|---|
thread (default) | shares the host address space | SharedArrayBuffer, BufferReference (move) |
process | separate memory and permissions | ProcessSharedBuffer (OS shared memory) |
A SharedArrayBuffer or BufferReference cannot cross a process boundary —
reach for ProcessSharedBuffer when the worker is a process. See
Shared memory and
Buffer reference.
Because a process worker is just a child process, you can spawn it behind a
command prefix (worker.processCommandPrefix) so another tool launches it — a
sandbox like bwrap or a container like Docker:
worker: { runtime: "process", processCommandPrefix: ["bwrap", "--unshare-all", "--ro-bind", "/", "/"],}See Process workers for the full wrapper recipes.
HTTP request handling
Section titled “HTTP request handling”call.*() accepts Promise<supported> inputs, not just already-resolved
values. In an HTTP handler this lets you forward the request body promise
straight into a task without stopping the request thread to materialize it
first:
app.post("/jwt", async (c) => { const responseJson = await handlers.call.issueJwt(c.req.arrayBuffer());
return c.body(responseJson ?? "Bad request", responseJson ? 200 : 400, { "content-type": "application/json; charset=utf-8", });});The promise isn’t faster on its own — the win is that the request thread never stops to materialize the body before handing it to Knitting:
c.req.arrayBuffer()already returns a promise, so forwarding it skips anawaitin the handler.- UTF-8 decode / JSON parsing happens in the worker, not on the request thread.
ArrayBufferstays on the binary fast path.
Head + body with Envelope
Section titled “Head + body with Envelope”When you need both the request metadata (the head) and the raw body, wrap them
in an Envelope: the header carries the parsed metadata, the payload carries
the body bytes. Shape it from the body promise with .then(...) so you still
never await the body on the request thread:
import { Envelope } from "knitting";
app.post("/upload", async (c) => { const result = await handlers.call.storeUpload( c.req.arrayBuffer().then( (body) => new Envelope( { contentType: c.req.header("content-type") ?? "application/octet-stream" }, body, ), ), );
return c.json(result);});For a large binary body sent to a thread worker, make the body a
BufferReference instead of an ArrayBuffer to move the bytes with no copy —
only the body line changes:
import { BufferReference } from "knitting/unsafe";
new Envelope( { contentType: c.req.header("content-type") ?? "application/octet-stream" }, new BufferReference(body), // moves the body bytes, zero-copy (thread workers));This is most useful when the route is basically a transport layer and the worker owns parsing anyway, like SSR or JWT issuance. If you need to inspect the body on the main thread before dispatch, await it locally and validate there.
Choosing a return type
Section titled “Choosing a return type”The return path has the same costs as the input path, in reverse. There are many ways to return a result; pick by how big it is and what it already is:
- JSON object / array — serialized on the worker and parsed again on the host, so it pays a double pass over the data (stringify + parse). Fine for small results, heavy for large ones.
SharedArrayBuffer/ProcessSharedBuffer— pointer teleportation; the cheapest way to hand bytes back when you can back the result with shared memory.ProcessSharedBufferalso works from process workers.BufferReference— for big binaries you cannot easily cast into aSharedArrayBuffer(for example a buffer produced by a library you don’t control). Zero-copy on Node; a single copy on Deno and Bun. Thread workers only.
See Payloads and Buffer reference for the full type list.