Architecture
Ever wondered what actually happens between call.myTask(args) and the result landing back in your hands? This page walks that path. You don’t need any of it to use Knitting — the Quick start and Creating pools guides have you covered — but it’s here for when you’re curious why things are fast, chasing a strange bug, or reaching for the advanced knobs.
One caveat up front: this is a simplified mental model. The real implementation has many more moving parts — fast paths, edge cases, platform quirks — that are deliberately left out so the big picture stays readable. Treat the numbers and steps below as the shape of things, not the full contract.
High-level picture
Section titled “High-level picture”Knitting has three layers:
- API layer —
task(),createPool(),call.*(),shutdown(). This is what your code touches. - Dispatch layer — the host handler: lane routing, the optional balancer strategy, and the inliner. This decides where a call runs.
- Transport layer — shared-memory mailboxes, payload buffers, wakeups. This moves data between threads without going through the runtime’s message queue.
Knitting runs thread workers by default. It can also run each worker as a separate process (for sandboxing or containers): the transport is the same shared-memory idea; a process worker just reaches the memory through a named mapping instead of an inherited handle. See Process workers and Shared memory.
Transport: shared-memory mailboxes
Section titled “Transport: shared-memory mailboxes”This is the core of Knitting’s speed advantage. Instead of using postMessage (which serializes data, queues it, and deserializes on the other side), Knitting writes directly to SharedArrayBuffer regions visible to both threads.
The mailbox model
Section titled “The mailbox model”Each worker has two independent mailboxes:
- Request mailbox (host -> worker): the host writes a call header here, the worker reads it.
- Response mailbox (worker -> host): the worker writes the result header here, the host reads it.
Each mailbox has 32 slots — the slot index is a 5-bit field, so 2⁵ = 32 slots per direction. A slot is a small fixed-size region that holds:
- Task function ID (
Uint16, supports up to 65,536 tasks per pool) - Payload type tag
- Small inline values (numbers, booleans, short strings fit directly in the header)
Slot ownership via a two-word lock
Section titled “Slot ownership via a two-word lock”Slot state lives in two 32-bit atomic words: hostBits and workerBits. They sit on separate 64-byte cache lines, so the host and worker never fight over the same line — the false sharing that would otherwise bounce a cache line between cores and collapse throughput. A slot is busy when the two words disagree on its bit and free when they agree:
Publishing work is therefore a single atomic toggle of one bit — not a message copy. Because each direction has exactly one writer (the host writes requests, the worker writes responses), this is a single-producer/single-consumer queue per direction: no mutex, no critical section, just write the payload, then publish the bit. The reader acquires the bit before it trusts the bytes.
The lifecycle of a slot:
- Host atomically claims a free slot in the request mailbox.
- Host writes the call header (task ID, payload tag, inline data or payload offset).
- Host notifies the worker.
- Worker reads the slot, executes the task.
- Worker claims a slot in the response mailbox, writes the result.
- Worker notifies the host.
- Host reads the result, releases the response slot, and resolves the promise.
Wakeups
Section titled “Wakeups”When the host writes a request, it needs to wake a potentially parked worker. When a worker writes a response, it needs to wake the host handler. Knitting uses Atomics.notify (futex-style wakeups) for this.
Workers don’t busy-wait by default. The idle cycle is a bounded spin, then a park:
- Spin — check the mailbox in a tight loop for
spinMicroseconds(default: scales with lane count). UsesAtomics.pauseto reduce power draw. - Park — call
Atomics.waitwith a timeout ofparkMs. The thread sleeps until notified or the timeout expires. - Wake —
Atomics.notifyfrom the host breaks the park immediately.
This means idle workers consume near-zero CPU while still waking up within microseconds when work arrives.
Data path: payload buffers
Section titled “Data path: payload buffers”Values that don’t fit in the mailbox header slot need a separate path.
Each worker allocates two SharedArrayBuffer regions:
- Request payload buffer — for arguments sent host -> worker
- Return payload buffer — for results sent worker -> host
Sizes are controlled by payloadInitialBytes (default 4 MiB) and payloadMaxByteLength (default 64 MiB). When growable SharedArrayBuffer is available in the runtime, buffers start small and grow on demand. Otherwise, they’re allocated at max size upfront.
Three encoding paths (fast -> slow)
Section titled “Three encoding paths (fast -> slow)”| Path | When it’s used | Cost |
|---|---|---|
| Header-only | Primitives (boolean, null, undefined, number, small string, small bigint, Date) | Near zero. Value fits in the slot header. |
| Static payload | Small typed arrays, short strings that overflow the header | Low. Copies into a fixed region of the payload buffer. |
| Dynamic payload | Objects, arrays, large strings, Error | Higher. Requires allocation, encoding (similar to JSON serialization), and copying. |
Shared-memory types (SharedArrayBuffer, ProcessSharedBuffer) and the ownership-move BufferReference skip all three: the bytes aren’t copied, only a small descriptor crosses. The Performance guide has exact tier ratings per type.
Dispatch: lanes and routing
Section titled “Dispatch: lanes and routing”A lane is an execution target — either a worker thread or the inline lane. The total lane count is threads + (inliner ? 1 : 0).
When you call call.myTask(args), the host handler:
- Resolves any promise arguments on the host.
- Picks a lane (using the balancer strategy when there’s more than one).
- Encodes the arguments and writes them to that lane’s mailbox (or queues them for the inliner).
- Returns a promise that resolves when the response arrives.
The handler runs on every call. Which lane it picks is the job of the balancer strategy — and that only matters when there’s more than one lane to choose between. With a single worker and no inliner there is nothing to balance, so the balancer is bypassed entirely.
Balancer strategies
Section titled “Balancer strategies”These are the values of the balancer option on createPool:
| Strategy | Behavior |
|---|---|
roundRobin (default) | Rotates through lanes in order. Simple, fair, predictable. |
firstIdle | Picks the first lane with no in-flight work, falls back to round-robin. |
randomLane | Picks a random lane. Good for uneven task durations. |
firstIdleOrRandom | First idle lane, else random. Balances fairness with load distribution. |
Handler backoff
Section titled “Handler backoff”The host handler has its own stall-avoidance logic:
stallFreeLoops(default 128): how many immediate notify-check loops run before backoff starts.maxBackoffMs(default 10): ceiling for exponential backoff delay.
Under sustained high load, the handler stays in tight loops. Under intermittent load, it backs off to avoid burning CPU while idle.
Inliner: the host as a lane
Section titled “Inliner: the host as a lane”The optional inliner adds the host thread itself as an execution lane. Inline tasks skip the entire transport layer — no encode, no mailbox write, no decode. The task function runs directly on the main thread.
Inline execution is deferred to a macro-task boundary (via MessageChannel) so the handler can service worker sends/receives first, then the host drains inline work.
Key details:
position: "first" | "last"— where the inline lane sits in the balancer’s lane order.batchSize— how many inline tasks run per event-loop tick.dispatchThreshold— minimum in-flight calls before the inline lane is eligible.- Abort signals on inline tasks use a static toolkit where
hasAborted()always returnsfalse(inline tasks can’t be individually cancelled since they share the host thread).
See Inliner guide for when to use it and when to avoid it.
Task lifecycle (end to end)
Section titled “Task lifecycle (end to end)”Once the handler has picked a lane, that lane’s tx-queue drives the round trip. Here’s a single call.add([1, 2]):
- User code (host):
pool.call.add([1, 2])returns aPromiseimmediately — the input isn’t a promise, so nothing is awaited first. - Host tx-queue: encodes the call header + payload into a free request slot. (
[1, 2]is a small tuple, so it takes the static-payload path at a known offset.) - Host tx-queue: toggles
hostBitsto publish the slot. - Worker loop: had been spinning, then parked on the signal word. The signal word changed, so it wakes.
- Worker loop: decodes the args and runs the task fn —
([a, b]) => a + breturns3. - Worker loop: writes the result into a response slot and toggles
workerBitsto publish. - Host: drains the result from shared memory and toggles its bit to release the slot.
- Host: resolves (or rejects) the
Promisereturned in step 1.
If the task throws, step 6 writes an error result instead, and step 8 rejects the promise.
Pool lifecycle
Section titled “Pool lifecycle”Startup (createPool)
Section titled “Startup (createPool)”- Allocates shared memory regions (mailboxes + payload buffers) for each worker.
- Spawns
threadsworkers — threads by default, or separate processes when configured. Each worker imports the task module to discover exportedtask()values. - Workers enter their idle spin/park loop, waiting for work.
- If
permissionis set, generates runtime-specific CLI flags and passes them viaworkerExecArgv. - Returns the pool — a typed
{ call, shutdown }object that is also disposable, so ausingdeclaration can close it for you.
Running
Section titled “Running”- Calls flow through the dispatch layer continuously.
- Workers spin briefly after completing work (in case more arrives), then park.
- The host handler manages backoff independently.
Shutdown
Section titled “Shutdown”A using pool shuts down on its own when the scope ends. Calling shutdown()
yourself runs the same teardown — just earlier, the moment you ask for it.
Either way:
shutdown()signals all workers to stop.- If
resolveAfterFinishingAllistrue, workers finish all pending promises before exiting. - All in-flight
call.*()promises for abort-aware tasks reject with"Thread closed". - Worker threads terminate.
Abort signal pool
Section titled “Abort signal pool”Tasks defined with abortSignal: true or abortSignal: { hasAborted: true } use a shared-memory bitset to track cancellation state. The pool has a fixed capacity (default 258, tunable via abortSignalCapacity).
When the host calls .reject() on an abort-aware promise, it flips a bit in the shared bitset. The worker can poll toolkit.hasAborted() to check that bit and bail out early.
This is cooperative, not preemptive — the worker must check. If it doesn’t, the host promise still rejects immediately, but the worker task runs to completion in the background.
Memory layout (per worker)
Section titled “Memory layout (per worker)”| Region | Default size | Purpose |
|---|---|---|
| Request mailbox | Fixed (32 slots) | Call headers, host -> worker |
| Response mailbox | Fixed (32 slots) | Result headers, worker -> host |
| Request payload buffer | 4 MiB initial, 64 MiB max | Argument data |
| Return payload buffer | 4 MiB initial, 64 MiB max | Result data |
| Abort signal bitset | Scales with abortSignalCapacity | Cancellation flags |
With default settings and 4 workers, the shared memory footprint is roughly:
4 workers x (2 mailboxes + 2 x 4 MiB payload buffers) ~ 32 MiB initial
Payload buffers grow on demand up to payloadMaxByteLength if the runtime supports growable SharedArrayBuffer.
Safety layers
Section titled “Safety layers”Workers run code the host may or may not trust, so isolation is a dial, not a switch. Knitting stacks four independent layers — cheapest and softest first, costliest and hardest last. They compose: trusted local tasks can stop at Layer 1, while an untrusted plugin can be pushed all the way to a sandboxed process. This is defence in depth; no single layer is assumed sufficient on its own.
- Layer 1 — in-process guards (always on). Before any task module loads, the worker neutralizes the most dangerous calls:
process.exit,process.kill,process.abort, andDeno.exitare redefined to throw, and the raw shared-memory handles are scrubbed from the data object visible to task code. Cheap, but a guardrail against accidental misuse — not a wall against a hostile co-resident. - Layer 2 — bootstrap hook.
worker.bootstrapruns a privileged module once per worker, before task imports — the right place to strip env vars, install your own guards, or freeze globals. - Layer 3 — runtime permissions (strict by default). The policy is translated into each runtime’s native enforcement (Deno’s permission flags, Node’s permission model, Bun’s equivalents), so the boundary is the runtime’s, not a library check task code could bypass.
- Layer 4 — process + real sandbox. The only OS-enforced boundary:
worker.runtime: "process", optionally launched throughbwrap, Docker, orsystemd-runviaprocessCommandPrefix. Pair it withimportTaskso the isolated code never loads in the host at all.
See Permissions and Process workers for the full configuration.
What Knitting does NOT do
Section titled “What Knitting does NOT do”Knowing where Knitting stops is as useful as knowing what it does:
- No message passing protocol. Knitting is task-call oriented. If you need pub/sub or event-style messaging, use
postMessage/MessagePort. - No preemption. A long-running task blocks its lane until it finishes. Use
abortSignalwithhasAborted()polling for cooperative cancellation. - Same host only. Workers can be threads or separate processes on one machine, but the shared memory never crosses the network — there is no cross-machine transport. Pair Knitting with one if you need to scale out.
- No automatic scaling. Worker count is fixed at pool creation. You choose the parallelism level upfront.
- Browser support is limited. There is a browser build, but shared memory there needs cross-origin isolation (
COOP/COEPheaders) because of Spectre-era constraints. Most Knitting work still happens server-side.