Skip to content
A simplified mental model — plenty of real-world detail is left out for the sake of sanity.

Architecture

Ever wondered what actually happens between call.myTask(args) and the result landing back in your hands? This page walks that path. You don’t need any of it to use Knitting — the Quick start and Creating pools guides have you covered — but it’s here for when you’re curious why things are fast, chasing a strange bug, or reaching for the advanced knobs.

One caveat up front: this is a simplified mental model. The real implementation has many more moving parts — fast paths, edge cases, platform quirks — that are deliberately left out so the big picture stays readable. Treat the numbers and steps below as the shape of things, not the full contract.

Knitting has three layers:

  1. API layertask(), createPool(), call.*(), shutdown(). This is what your code touches.
  2. Dispatch layer — the host handler: lane routing, the optional balancer strategy, and the inliner. This decides where a call runs.
  3. Transport layer — shared-memory mailboxes, payload buffers, wakeups. This moves data between threads without going through the runtime’s message queue.

Knitting runs thread workers by default. It can also run each worker as a separate process (for sandboxing or containers): the transport is the same shared-memory idea; a process worker just reaches the memory through a named mapping instead of an inherited handle. See Process workers and Shared memory.

This is the core of Knitting’s speed advantage. Instead of using postMessage (which serializes data, queues it, and deserializes on the other side), Knitting writes directly to SharedArrayBuffer regions visible to both threads.

Each worker has two independent mailboxes:

  • Request mailbox (host -> worker): the host writes a call header here, the worker reads it.
  • Response mailbox (worker -> host): the worker writes the result header here, the host reads it.

Each mailbox has 32 slots — the slot index is a 5-bit field, so 2⁵ = 32 slots per direction. A slot is a small fixed-size region that holds:

  • Task function ID (Uint16, supports up to 65,536 tasks per pool)
  • Payload type tag
  • Small inline values (numbers, booleans, short strings fit directly in the header)

Slot state lives in two 32-bit atomic words: hostBits and workerBits. They sit on separate 64-byte cache lines, so the host and worker never fight over the same line — the false sharing that would otherwise bounce a cache line between cores and collapse throughput. A slot is busy when the two words disagree on its bit and free when they agree:

free=(HW) & MASK\text{free} = \sim(H \oplus W)\ \&\ \text{MASK}

Publishing work is therefore a single atomic toggle of one bit — not a message copy. Because each direction has exactly one writer (the host writes requests, the worker writes responses), this is a single-producer/single-consumer queue per direction: no mutex, no critical section, just write the payload, then publish the bit. The reader acquires the bit before it trusts the bytes.

The lifecycle of a slot:

  1. Host atomically claims a free slot in the request mailbox.
  2. Host writes the call header (task ID, payload tag, inline data or payload offset).
  3. Host notifies the worker.
  4. Worker reads the slot, executes the task.
  5. Worker claims a slot in the response mailbox, writes the result.
  6. Worker notifies the host.
  7. Host reads the result, releases the response slot, and resolves the promise.

When the host writes a request, it needs to wake a potentially parked worker. When a worker writes a response, it needs to wake the host handler. Knitting uses Atomics.notify (futex-style wakeups) for this.

Workers don’t busy-wait by default. The idle cycle is a bounded spin, then a park:

SpinParkWorkingbudget elapsednotify / timeoutshutdownsignal changedmore work?
  1. Spin — check the mailbox in a tight loop for spinMicroseconds (default: scales with lane count). Uses Atomics.pause to reduce power draw.
  2. Park — call Atomics.wait with a timeout of parkMs. The thread sleeps until notified or the timeout expires.
  3. WakeAtomics.notify from the host breaks the park immediately.

This means idle workers consume near-zero CPU while still waking up within microseconds when work arrives.

Values that don’t fit in the mailbox header slot need a separate path.

Each worker allocates two SharedArrayBuffer regions:

  • Request payload buffer — for arguments sent host -> worker
  • Return payload buffer — for results sent worker -> host

Sizes are controlled by payloadInitialBytes (default 4 MiB) and payloadMaxByteLength (default 64 MiB). When growable SharedArrayBuffer is available in the runtime, buffers start small and grow on demand. Otherwise, they’re allocated at max size upfront.

PathWhen it’s usedCost
Header-onlyPrimitives (boolean, null, undefined, number, small string, small bigint, Date)Near zero. Value fits in the slot header.
Static payloadSmall typed arrays, short strings that overflow the headerLow. Copies into a fixed region of the payload buffer.
Dynamic payloadObjects, arrays, large strings, ErrorHigher. Requires allocation, encoding (similar to JSON serialization), and copying.

Shared-memory types (SharedArrayBuffer, ProcessSharedBuffer) and the ownership-move BufferReference skip all three: the bytes aren’t copied, only a small descriptor crosses. The Performance guide has exact tier ratings per type.

A lane is an execution target — either a worker thread or the inline lane. The total lane count is threads + (inliner ? 1 : 0).

When you call call.myTask(args), the host handler:

  1. Resolves any promise arguments on the host.
  2. Picks a lane (using the balancer strategy when there’s more than one).
  3. Encodes the arguments and writes them to that lane’s mailbox (or queues them for the inliner).
  4. Returns a promise that resolves when the response arrives.

The handler runs on every call. Which lane it picks is the job of the balancer strategy — and that only matters when there’s more than one lane to choose between. With a single worker and no inliner there is nothing to balance, so the balancer is bypassed entirely.

These are the values of the balancer option on createPool:

StrategyBehavior
roundRobin (default)Rotates through lanes in order. Simple, fair, predictable.
firstIdlePicks the first lane with no in-flight work, falls back to round-robin.
randomLanePicks a random lane. Good for uneven task durations.
firstIdleOrRandomFirst idle lane, else random. Balances fairness with load distribution.

The host handler has its own stall-avoidance logic:

  • stallFreeLoops (default 128): how many immediate notify-check loops run before backoff starts.
  • maxBackoffMs (default 10): ceiling for exponential backoff delay.

Under sustained high load, the handler stays in tight loops. Under intermittent load, it backs off to avoid burning CPU while idle.

The optional inliner adds the host thread itself as an execution lane. Inline tasks skip the entire transport layer — no encode, no mailbox write, no decode. The task function runs directly on the main thread.

Inline execution is deferred to a macro-task boundary (via MessageChannel) so the handler can service worker sends/receives first, then the host drains inline work.

Key details:

  • position: "first" | "last" — where the inline lane sits in the balancer’s lane order.
  • batchSize — how many inline tasks run per event-loop tick.
  • dispatchThreshold — minimum in-flight calls before the inline lane is eligible.
  • Abort signals on inline tasks use a static toolkit where hasAborted() always returns false (inline tasks can’t be individually cancelled since they share the host thread).

See Inliner guide for when to use it and when to avoid it.

Once the handler has picked a lane, that lane’s tx-queue drives the round trip. Here’s a single call.add([1, 2]):

User code (host)Host tx-queueShared memoryWorker looppool.call.fn(args) → Promiseencode header + payload into a free slottoggle hostBits (publish)spin → pause → parkon the signal wordsignal word changed → wakedecode args, run task fnwrite result, toggle workerBits (publish)host drains result, toggles its bitresolve / reject the Promise
  1. User code (host): pool.call.add([1, 2]) returns a Promise immediately — the input isn’t a promise, so nothing is awaited first.
  2. Host tx-queue: encodes the call header + payload into a free request slot. ([1, 2] is a small tuple, so it takes the static-payload path at a known offset.)
  3. Host tx-queue: toggles hostBits to publish the slot.
  4. Worker loop: had been spinning, then parked on the signal word. The signal word changed, so it wakes.
  5. Worker loop: decodes the args and runs the task fn — ([a, b]) => a + b returns 3.
  6. Worker loop: writes the result into a response slot and toggles workerBits to publish.
  7. Host: drains the result from shared memory and toggles its bit to release the slot.
  8. Host: resolves (or rejects) the Promise returned in step 1.

If the task throws, step 6 writes an error result instead, and step 8 rejects the promise.

  1. Allocates shared memory regions (mailboxes + payload buffers) for each worker.
  2. Spawns threads workers — threads by default, or separate processes when configured. Each worker imports the task module to discover exported task() values.
  3. Workers enter their idle spin/park loop, waiting for work.
  4. If permission is set, generates runtime-specific CLI flags and passes them via workerExecArgv.
  5. Returns the pool — a typed { call, shutdown } object that is also disposable, so a using declaration can close it for you.
  • Calls flow through the dispatch layer continuously.
  • Workers spin briefly after completing work (in case more arrives), then park.
  • The host handler manages backoff independently.

A using pool shuts down on its own when the scope ends. Calling shutdown() yourself runs the same teardown — just earlier, the moment you ask for it. Either way:

  1. shutdown() signals all workers to stop.
  2. If resolveAfterFinishingAll is true, workers finish all pending promises before exiting.
  3. All in-flight call.*() promises for abort-aware tasks reject with "Thread closed".
  4. Worker threads terminate.

Tasks defined with abortSignal: true or abortSignal: { hasAborted: true } use a shared-memory bitset to track cancellation state. The pool has a fixed capacity (default 258, tunable via abortSignalCapacity).

When the host calls .reject() on an abort-aware promise, it flips a bit in the shared bitset. The worker can poll toolkit.hasAborted() to check that bit and bail out early.

This is cooperative, not preemptive — the worker must check. If it doesn’t, the host promise still rejects immediately, but the worker task runs to completion in the background.

RegionDefault sizePurpose
Request mailboxFixed (32 slots)Call headers, host -> worker
Response mailboxFixed (32 slots)Result headers, worker -> host
Request payload buffer4 MiB initial, 64 MiB maxArgument data
Return payload buffer4 MiB initial, 64 MiB maxResult data
Abort signal bitsetScales with abortSignalCapacityCancellation flags

With default settings and 4 workers, the shared memory footprint is roughly: 4 workers x (2 mailboxes + 2 x 4 MiB payload buffers) ~ 32 MiB initial

Payload buffers grow on demand up to payloadMaxByteLength if the runtime supports growable SharedArrayBuffer.

Workers run code the host may or may not trust, so isolation is a dial, not a switch. Knitting stacks four independent layers — cheapest and softest first, costliest and hardest last. They compose: trusted local tasks can stop at Layer 1, while an untrusted plugin can be pushed all the way to a sandboxed process. This is defence in depth; no single layer is assumed sufficient on its own.

Task codeLayer 1in-process guardsalways onLayer 2bootstrap hookpre-import setupLayer 3runtime permissionsstrict by defaultLayer 4process + sandboxOS boundary
  • Layer 1 — in-process guards (always on). Before any task module loads, the worker neutralizes the most dangerous calls: process.exit, process.kill, process.abort, and Deno.exit are redefined to throw, and the raw shared-memory handles are scrubbed from the data object visible to task code. Cheap, but a guardrail against accidental misuse — not a wall against a hostile co-resident.
  • Layer 2 — bootstrap hook. worker.bootstrap runs a privileged module once per worker, before task imports — the right place to strip env vars, install your own guards, or freeze globals.
  • Layer 3 — runtime permissions (strict by default). The policy is translated into each runtime’s native enforcement (Deno’s permission flags, Node’s permission model, Bun’s equivalents), so the boundary is the runtime’s, not a library check task code could bypass.
  • Layer 4 — process + real sandbox. The only OS-enforced boundary: worker.runtime: "process", optionally launched through bwrap, Docker, or systemd-run via processCommandPrefix. Pair it with importTask so the isolated code never loads in the host at all.

See Permissions and Process workers for the full configuration.

Knowing where Knitting stops is as useful as knowing what it does:

  • No message passing protocol. Knitting is task-call oriented. If you need pub/sub or event-style messaging, use postMessage / MessagePort.
  • No preemption. A long-running task blocks its lane until it finishes. Use abortSignal with hasAborted() polling for cooperative cancellation.
  • Same host only. Workers can be threads or separate processes on one machine, but the shared memory never crosses the network — there is no cross-machine transport. Pair Knitting with one if you need to scale out.
  • No automatic scaling. Worker count is fixed at pool creation. You choose the parallelism level upfront.
  • Browser support is limited. There is a browser build, but shared memory there needs cross-origin isolation (COOP / COEP headers) because of Spectre-era constraints. Most Knitting work still happens server-side.