Skip to content

Node.js

This page summarizes Node.js benchmark runs for Knitting on node 24.12.0 (arm64-darwin).

This benchmark compares one round-trip between a main thread and workers using different transports. Knitting has the lowest overhead in this setup:

  • 1 message: Knitting is about 6x faster than worker postMessage, 15x faster than websocket, and 57x faster than HTTP.
  • 25 messages: Knitting is about 4x faster than worker postMessage.
  • 50 messages: Knitting is about 3.5x faster than worker postMessage.
Node IPC benchmark

These charts compare the same payload families sent through Knitting and worker_threads.

With a single value per call, Knitting is consistently faster:

  • For small primitives, Knitting is roughly 10-14x faster than worker postMessage.
  • Worker postMessage stays near ~10 µs/iter even for tiny payloads.
  • For larger payloads, Knitting still holds around a 4x advantage (for example, big object: 4.15 µs vs 15.74 µs).
Node knitting vs worker benchmark 1

At 100 messages per iteration, the gap remains strong:

  • Typical primitives stay around ~6-8x faster with Knitting.
  • Heavier payloads still keep a clear edge at roughly ~2.4-4.2x faster.
  • Batching improves throughput for both, but Knitting remains lower-overhead across payload classes.
Node knitting vs worker benchmark 100

This benchmark increases payload size from 32 B up to 1,048,576 B (1 MiB) and reports batched cost with batch=64.

Using the 1048576 B row (avg) from the batch=64 run, one-way transfer throughput is:

  • string: 46.54 ms/iter -> 1.44 GB/s
  • Uint8Array: 8.95 ms/iter -> 7.50 GB/s

Interpretation:

  • In this batched shape, Node’s Uint8Array path is much stronger than its string path.
  • For 1 MiB binary payloads, Node lands in the ~7.5 GB/s range in this run.
node_call-growth-batch.md
clk: ~3.70 GHz
cpu: Apple M3 Ultra
runtime: node 24.12.0 (arm64-darwin)
| • call growth batch string (ascii 32..1048576 x4, batch=64) | avg | min | p75 | p99 | max |
| --------- | ---------------- | ----------- | ----------- | ----------- | ----------- |
| 32 B | ` 22.96 µs/iter` | ` 14.58 µs` | ` 28.71 µs` | ` 54.04 µs` | `582.58 µs` |
| 128 B | ` 27.04 µs/iter` | ` 24.48 µs` | ` 27.65 µs` | ` 28.63 µs` | ` 28.94 µs` |
| 512 B | ` 44.43 µs/iter` | ` 40.30 µs` | ` 46.17 µs` | ` 46.61 µs` | ` 48.79 µs` |
| 2048 B | `142.37 µs/iter` | `103.71 µs` | `168.75 µs` | `233.46 µs` | `301.04 µs` |
| 8192 B | `399.94 µs/iter` | `339.63 µs` | `418.25 µs` | `513.25 µs` | ` 1.59 ms` |
| 32768 B | ` 1.40 ms/iter` | ` 1.28 ms` | ` 1.43 ms` | ` 2.43 ms` | ` 2.56 ms` |
| 131072 B | ` 7.07 ms/iter` | ` 5.41 ms` | ` 7.39 ms` | ` 11.23 ms` | ` 11.49 ms` |
| 524288 B | ` 23.22 ms/iter` | ` 21.87 ms` | ` 23.57 ms` | ` 24.56 ms` | ` 25.24 ms` |
| 1048576 B | ` 46.54 ms/iter` | ` 44.57 ms` | ` 47.59 ms` | ` 47.69 ms` | ` 48.34 ms` |
| • call growth batch uint8array (32..1048576 x4, batch=64) | avg | min | p75 | p99 | max |
| --------- | ---------------- | ----------- | ----------- | ----------- | ----------- |
| 32 B | ` 25.21 µs/iter` | ` 15.46 µs` | ` 29.63 µs` | ` 55.04 µs` | `462.75 µs` |
| 128 B | ` 26.11 µs/iter` | ` 23.47 µs` | ` 25.99 µs` | ` 30.16 µs` | ` 33.14 µs` |
| 512 B | ` 26.26 µs/iter` | ` 24.79 µs` | ` 26.35 µs` | ` 26.92 µs` | ` 32.01 µs` |
| 2048 B | ` 81.17 µs/iter` | ` 41.04 µs` | ` 97.46 µs` | `178.63 µs` | ` 4.42 ms` |
| 8192 B | `181.59 µs/iter` | ` 91.17 µs` | `190.71 µs` | ` 1.10 ms` | ` 3.30 ms` |
| 32768 B | `431.63 µs/iter` | `178.17 µs` | `370.88 µs` | ` 2.35 ms` | ` 3.21 ms` |
| 131072 B | ` 1.40 ms/iter` | `777.21 µs` | ` 1.72 ms` | ` 3.20 ms` | ` 3.36 ms` |
| 524288 B | ` 4.66 ms/iter` | ` 3.53 ms` | ` 5.08 ms` | ` 6.42 ms` | ` 7.07 ms` |
| 1048576 B | ` 8.95 ms/iter` | ` 7.54 ms` | ` 9.32 ms` | ` 11.37 ms` | ` 12.30 ms` |

This stress test computes prime numbers over a large range, then serializes and parses large JSON payloads:

const N = 10_000_000; // search range: [1..N]
const CHUNK_SIZE = 250_000;

Even under this heavier workload, parallel workers scale well:

  • main + 1 extra thread: ~1.7x faster than main only.
  • main + 2 extra threads: ~2.3x faster than main only.
  • main + 3 extra threads: ~3.0x faster than main only.
  • main + 4 extra threads: ~3.5x faster than main only.
node_withload.md
clk: ~3.72 GHz
cpu: Apple M3 Ultra
runtime: node 24.12.0 (arm64-darwin)
| • knitting: primes up to 10,000,000 (chunk=250,000) | avg | min | p75 | p99 | max |
| ----------------------------------- | ---------------- | ----------- | ----------- | ----------- | ----------- |
| main | `959.14 ms/iter` | `955.16 ms` | `959.93 ms` | `961.89 ms` | `963.68 ms` |
| main + 1 extra threads → full range | `538.28 ms/iter` | `531.37 ms` | `539.87 ms` | `541.65 ms` | `547.02 ms` |
| main + 2 extra threads → full range | `401.87 ms/iter` | `395.43 ms` | `404.32 ms` | `406.77 ms` | `408.40 ms` |
| main + 3 extra threads → full range | `317.52 ms/iter` | `311.27 ms` | `319.14 ms` | `323.94 ms` | `327.36 ms` |
| main + 4 extra threads → full range | `274.21 ms/iter` | `270.63 ms` | `276.88 ms` | `278.30 ms` | `279.69 ms` |

This benchmark covers primitive, structured, collection, typed-array, error/date/symbol, promise-arg, and static-vs-dynamic allocator paths. Results are reported for count 1 and count 100 to show both per-call latency and batched throughput.

Quick takeaways:

  • In count 100, primitive-style payloads are usually in the ~15-30 µs range, while heavier structured/collection payloads can be ~120-300+ µs.
  • The static payload path is usually around 2x-4x faster than dynamic allocator paths (for example: string ~3.4x, json ~3.0x, Uint8Array ~3.6x, symbol ~4.1x at count 100).

Payload sizes (approximate):

PayloadSize
jsonObj206 B
jsonArr217 B
mapPayload284 B
Uint8Array1024 B
Int32Array1024 B
Float64Array1024 B
BigInt64Array1024 B
BigUint64Array1024 B
DataView1024 B
smallU8480 B
largeU8481 B
node_types_knitting.md
payload sizes (approx bytes):
jsonObj: 206 bytes
jsonArr: 217 bytes
stringHuge: 1024 bytes
Uint8Array: 1024 bytes
Int32Array: 1024 bytes
Float64Array: 1024 bytes
BigInt64Array: 1024 bytes
BigUint64Array: 1024 bytes
DataView: 1024 bytes
clk: ~3.73 GHz
cpu: Apple M3 Ultra
runtime: node 24.12.0 (arm64-darwin)
| • knitting-types 1 | avg | min | p75 | p99 | max |
| --------------------- | ---------------- | ----------- | ----------- | ----------- | ----------- |
| number -> (1) | ` 1.16 µs/iter` | `458.00 ns` | ` 1.54 µs` | ` 2.75 µs` | ` 1.63 ms` |
| bigint small -> (1) | ` 1.05 µs/iter` | `500.00 ns` | ` 1.50 µs` | ` 2.17 µs` | ` 1.26 ms` |
| bigint large -> (1) | ` 2.48 µs/iter` | ` 1.25 µs` | ` 2.54 µs` | ` 5.58 µs` | ` 4.13 ms` |
| boolean true -> (1) | `901.13 ns/iter` | `526.34 ns` | ` 1.21 µs` | ` 1.58 µs` | ` 1.60 µs` |
| boolean false -> (1) | `824.29 ns/iter` | `520.36 ns` | ` 1.03 µs` | ` 1.57 µs` | ` 1.65 µs` |
| undefined -> (1) | `984.18 ns/iter` | `526.26 ns` | ` 1.32 µs` | ` 1.58 µs` | ` 1.62 µs` |
| null -> (1) | `847.02 ns/iter` | `525.11 ns` | ` 1.12 µs` | ` 1.57 µs` | ` 1.57 µs` |
| string -> (1) | ` 1.44 µs/iter` | `583.00 ns` | ` 2.08 µs` | ` 2.75 µs` | ` 1.24 ms` |
| json object -> (1) | ` 3.47 µs/iter` | ` 2.46 µs` | ` 3.83 µs` | ` 8.04 µs` | ` 1.27 ms` |
| json array -> (1) | ` 4.65 µs/iter` | ` 3.42 µs` | ` 4.96 µs` | ` 9.92 µs` | ` 1.24 ms` |
| Uint8Array -> (1) | ` 2.83 µs/iter` | ` 1.13 µs` | ` 3.42 µs` | ` 7.25 µs` | ` 1.28 ms` |
| ArrayBuffer -> (1) | ` 2.89 µs/iter` | ` 1.42 µs` | ` 3.71 µs` | ` 7.88 µs` | ` 1.31 ms` |
| Buffer -> (1) | ` 2.54 µs/iter` | ` 1.73 µs` | ` 2.95 µs` | ` 3.55 µs` | ` 3.57 µs` |
| string huge -> (1) | ` 3.24 µs/iter` | ` 1.92 µs` | ` 4.00 µs` | ` 7.50 µs` | ` 1.24 ms` |
| Int32Array -> (1) | ` 2.64 µs/iter` | ` 1.13 µs` | ` 3.33 µs` | ` 6.42 µs` | ` 1.27 ms` |
| Float64Array -> (1) | ` 2.76 µs/iter` | ` 1.38 µs` | ` 3.63 µs` | ` 7.42 µs` | ` 1.43 ms` |
| BigInt64Array -> (1) | ` 2.53 µs/iter` | ` 1.13 µs` | ` 3.25 µs` | ` 6.25 µs` | ` 1.27 ms` |
| BigUint64Array -> (1) | ` 2.58 µs/iter` | ` 1.17 µs` | ` 3.29 µs` | ` 6.29 µs` | ` 1.28 ms` |
| DataView -> (1) | ` 2.70 µs/iter` | ` 1.80 µs` | ` 3.08 µs` | ` 3.53 µs` | ` 3.57 µs` |
| Date -> (1) | ` 1.10 µs/iter` | `687.13 ns` | ` 1.37 µs` | ` 1.87 µs` | ` 1.96 µs` |
| Symbol.for -> (1) | ` 1.62 µs/iter` | ` 1.19 µs` | ` 1.84 µs` | ` 2.15 µs` | ` 2.24 µs` |
| • knitting-types 100 | avg | min | p75 | p99 | max |
| ----------------------- | ---------------- | ----------- | ----------- | ----------- | ----------- |
| number -> (100) | ` 25.45 µs/iter` | ` 15.33 µs` | ` 33.38 µs` | ` 62.25 µs` | ` 1.25 ms` |
| bigint small -> (100) | ` 27.62 µs/iter` | ` 25.17 µs` | ` 29.19 µs` | ` 30.32 µs` | ` 32.61 µs` |
| bigint large -> (100) | ` 70.42 µs/iter` | ` 53.38 µs` | ` 73.75 µs` | `134.50 µs` | ` 1.44 ms` |
| boolean true -> (100) | ` 27.65 µs/iter` | ` 23.07 µs` | ` 29.43 µs` | ` 31.76 µs` | ` 32.41 µs` |
| boolean false -> (100) | ` 26.79 µs/iter` | ` 24.41 µs` | ` 27.28 µs` | ` 28.78 µs` | ` 29.58 µs` |
| undefined -> (100) | ` 26.81 µs/iter` | ` 24.37 µs` | ` 27.73 µs` | ` 29.37 µs` | ` 34.09 µs` |
| null -> (100) | ` 24.44 µs/iter` | ` 21.21 µs` | ` 25.08 µs` | ` 26.05 µs` | ` 28.83 µs` |
| string -> (100) | ` 35.29 µs/iter` | ` 31.39 µs` | ` 36.22 µs` | ` 38.08 µs` | ` 41.26 µs` |
| json object -> (100) | `123.26 µs/iter` | `105.54 µs` | `123.92 µs` | `221.92 µs` | ` 1.39 ms` |
| json array -> (100) | `178.94 µs/iter` | `154.96 µs` | `185.13 µs` | `285.50 µs` | `491.88 µs` |
| Uint8Array -> (100) | `108.25 µs/iter` | ` 56.88 µs` | `141.25 µs` | `215.38 µs` | ` 3.06 ms` |
| ArrayBuffer -> (100) | `123.32 µs/iter` | ` 66.17 µs` | `153.63 µs` | `230.04 µs` | ` 2.51 ms` |
| Buffer -> (100) | `102.49 µs/iter` | ` 57.42 µs` | `139.83 µs` | `215.08 µs` | ` 2.97 ms` |
| string huge -> (100) | `162.92 µs/iter` | `111.25 µs` | `210.12 µs` | `268.54 µs` | ` 1.44 ms` |
| Int32Array -> (100) | `103.37 µs/iter` | ` 58.38 µs` | `141.13 µs` | `212.63 µs` | ` 3.04 ms` |
| Float64Array -> (100) | `121.34 µs/iter` | ` 65.71 µs` | `147.25 µs` | `230.96 µs` | ` 1.85 ms` |
| BigInt64Array -> (100) | `105.26 µs/iter` | ` 58.75 µs` | `139.79 µs` | `207.92 µs` | ` 3.10 ms` |
| BigUint64Array -> (100) | `102.90 µs/iter` | ` 59.21 µs` | `140.08 µs` | `203.58 µs` | ` 3.23 ms` |
| DataView -> (100) | `110.45 µs/iter` | ` 61.42 µs` | `145.21 µs` | `217.79 µs` | ` 2.74 ms` |
| Date -> (100) | ` 32.51 µs/iter` | ` 29.91 µs` | ` 33.09 µs` | ` 34.62 µs` | ` 38.05 µs` |
| Symbol.for -> (100) | ` 40.59 µs/iter` | ` 38.45 µs` | ` 41.20 µs` | ` 41.93 µs` | ` 42.32 µs` |
| • knitting-promise-args 1 | avg | min | p75 | p99 | max |
| --------------------- | ---------------- | ----------- | ----------- | ----------- | ----------- |
| promise number -> (1) | ` 1.65 µs/iter` | ` 1.20 µs` | ` 1.83 µs` | ` 2.12 µs` | ` 2.15 µs` |
| promise object -> (1) | ` 2.69 µs/iter` | ` 2.20 µs` | ` 2.93 µs` | ` 3.15 µs` | ` 3.16 µs` |
| • knitting-promise-args 100 | avg | min | p75 | p99 | max |
| ----------------------- | ---------------- | ----------- | ----------- | ----------- | ----------- |
| promise number -> (100) | ` 33.41 µs/iter` | ` 30.85 µs` | ` 34.19 µs` | ` 37.45 µs` | ` 37.53 µs` |
| promise object -> (100) | ` 71.73 µs/iter` | ` 57.46 µs` | ` 77.17 µs` | `139.38 µs` | `253.00 µs` |