Tokio
This page compares Tokio against Knitting on Bun, Node.js, and Deno using the same batch-oriented echo benchmark.
Benchmark source: mimiMonads/knitting-vs-tokio-bench.
What this benchmark measures
Section titled “What this benchmark measures”Whole-batch latency for three payload shapes:
f64String/ large UTF-8 textUint8Array/ raw bytes- separate
Arc<Vec<u8>>reference sweeps for tiny byte payloads
The “close to Tokio” claim on the homepage refers to the small end of this comparison: wakeups/signaling plus copying or cloning tiny payloads. The larger-payload sections below are a different cost regime.
The current summary was recorded on Ubuntu 23.10, x86_64, on an AMD Ryzen 7 4700U.
All runtimes use the same reporting setup:
- batch sizes:
1,10,100 - warmup:
200iterations forn=1,50otherwise - measured iterations:
500 - per-batch timing
- sorted samples:
avg,min,p75,p99,max
Batch avg latency (f64)
Section titled “Batch avg latency (f64)”For small scalar payloads, the JavaScript runtimes stay ahead on average in this run:
- At
n=1, Node.js is lowest at6.63 µs, followed by Bun at7.35 µs, Tokio at13.01 µs, and Deno at21.54 µs. - At
n=10, Deno is lowest at11.97 µs, then Bun at13.41 µs, Node.js at17.28 µs, and Tokio at27.50 µs. - At
n=100, Node.js (62.33 µs) and Deno (63.28 µs) stay ahead of Tokio (89.55 µs), with Bun in between at80.61 µs. - Tokio keeps the best
p99atn=1(16.85 µs), but Bun has the bestp99atn=10(36.58 µs) andn=100(92.37 µs).
# Benchmark Summary
## Sources
- tokio: `results/tokio-1773827721825.csv`- bun: `results/knitting-bun-1773827812276.csv`- node: `results/knitting-node-1773828074699.csv`- deno: `results/knitting-deno-1773827925019.csv`
## Machine Specs
- OS: Ubuntu 23.10- Kernel: 6.5.0-44-generic- Architecture: x86_64- CPU: AMD Ryzen 7 4700U with Radeon Graphics- Topology: 8 logical CPUs, 1 socket(s), 8 core(s)/socket, 1 thread(s)/core- Memory: 15.1 GiB- Swap: 4.0 GiB
## Methodology Notes
- The main string and byte benchmarks are intended to compare the same logical round trip on both sides: send payload, receive it in the worker, echo it back, receive it again on the caller, then wait for the whole batch.- In `src/main.ts`, the `string` and `Uint8Array` paths go through knitting transport in both directions. That transport materializes a fresh payload on receive, so the round trip includes payload work on both the request side and the reply side.- To keep the Tokio baseline fair, `src/main.rs` clones `String` and `Vec<u8>` on send and also clones again on the worker reply. The reply clone is intentional. Without it, Tokio would be measuring a cheaper return-path move while the JS runtimes were still paying for fresh payload materialization on the way back.- The `Arc<Vec<u8>>` sweep is intentionally separate and is not the default apples-to-apples byte benchmark. It exists as an upper-bound shared-bytes reference for small payloads. `Arc::clone` only bumps a refcount, so it is expected to be cheaper than copying bytes.- This means the default `string` and `Uint8Array` tables should be read as the fairer comparison, while the Arc section should be read as "how close does the normal transport get to shared ownership for small values?"
## Batch Avg Latency (less is better)
```textbenchmark | batch | tokio | bun | node | deno---------------------+-------+-----------+----------+----------+---------number f64 (8 bytes) | n=1 | 13.01 us | 7.35 us | 6.63 us | 21.54 usnumber f64 (8 bytes) | n=10 | 27.50 us | 13.41 us | 17.28 us | 11.97 usnumber f64 (8 bytes) | n=100 | 89.55 us | 80.61 us | 62.33 us | 63.28 uslarge string 1 MiB | n=1 | 221.35 us | 1.19 ms | 2.85 ms | 1.38 mslarge string 1 MiB | n=10 | 6.20 ms | 6.01 ms | 10.79 ms | 10.38 mslarge string 1 MiB | n=100 | 37.93 ms | 50.16 ms | 84.66 ms | 83.90 msUint8Array 1 MiB | n=1 | 272.81 us | 1.30 ms | 2.35 ms | 1.14 msUint8Array 1 MiB | n=10 | 4.64 ms | 5.27 ms | 5.22 ms | 6.36 msUint8Array 1 MiB | n=100 | 37.83 ms | 47.76 ms | 54.95 ms | 60.04 ms```
## Batch P99 Latency (less is better)
```textbenchmark | batch | tokio | bun | node | deno---------------------+-------+-----------+----------+-----------+----------number f64 (8 bytes) | n=1 | 16.85 us | 18.70 us | 26.25 us | 160.57 usnumber f64 (8 bytes) | n=10 | 40.83 us | 36.58 us | 81.26 us | 111.89 usnumber f64 (8 bytes) | n=100 | 203.57 us | 92.37 us | 314.11 us | 263.05 uslarge string 1 MiB | n=1 | 371.10 us | 3.74 ms | 3.73 ms | 2.97 mslarge string 1 MiB | n=10 | 8.44 ms | 8.41 ms | 15.75 ms | 14.45 mslarge string 1 MiB | n=100 | 40.81 ms | 61.62 ms | 105.51 ms | 106.67 msUint8Array 1 MiB | n=1 | 400.45 us | 3.03 ms | 5.58 ms | 5.81 msUint8Array 1 MiB | n=10 | 8.41 ms | 7.80 ms | 9.52 ms | 14.37 msUint8Array 1 MiB | n=100 | 43.71 ms | 59.77 ms | 72.39 ms | 81.71 ms```
## Avg Ratio Vs Tokio
```textbenchmark | batch | bun/tokio | node/tokio | deno/tokio---------------------+-------+-----------+------------+-----------number f64 (8 bytes) | n=1 | 0.56x | 0.51x | 1.66xnumber f64 (8 bytes) | n=10 | 0.49x | 0.63x | 0.44xnumber f64 (8 bytes) | n=100 | 0.90x | 0.70x | 0.71xlarge string 1 MiB | n=1 | 5.36x | 12.86x | 6.22xlarge string 1 MiB | n=10 | 0.97x | 1.74x | 1.68xlarge string 1 MiB | n=100 | 1.32x | 2.23x | 2.21xUint8Array 1 MiB | n=1 | 4.75x | 8.62x | 4.18xUint8Array 1 MiB | n=10 | 1.13x | 1.12x | 1.37xUint8Array 1 MiB | n=100 | 1.26x | 1.45x | 1.59x```
## Uint8Array Size Sweep Avg Latency (less is better)
```textsize | tokio | bun | node | deno--------+-----------+-----------+-----------+----------8 B | 82.99 us | 62.80 us | 88.20 us | 107.01 us16 B | 81.91 us | 56.24 us | 65.37 us | 95.78 us32 B | 85.70 us | 49.48 us | 65.76 us | 85.05 us64 B | 76.98 us | 42.68 us | 66.88 us | 78.27 us128 B | 92.53 us | 53.53 us | 79.28 us | 84.39 us256 B | 99.70 us | 63.42 us | 83.89 us | 100.44 us512 B | 86.67 us | 68.55 us | 97.07 us | 118.03 us1 KiB | 101.42 us | 171.09 us | 157.61 us | 169.50 us2 KiB | 191.25 us | 194.62 us | 220.68 us | 233.39 us4 KiB | 195.56 us | 260.16 us | 324.39 us | 391.43 us8 KiB | 208.84 us | 397.05 us | 465.89 us | 539.98 us16 KiB | 279.25 us | 649.18 us | 741.47 us | 899.81 us32 KiB | 1.48 ms | 1.14 ms | 1.27 ms | 1.41 ms64 KiB | 2.71 ms | 2.38 ms | 2.49 ms | 2.89 ms128 KiB | 5.66 ms | 5.02 ms | 5.11 ms | 6.08 ms256 KiB | 11.92 ms | 10.56 ms | 10.11 ms | 11.96 ms512 KiB | 23.06 ms | 22.33 ms | 22.66 ms | 25.26 ms1 MiB | 29.48 ms | 46.97 ms | 52.53 ms | 55.77 ms```
## Arc Comparison Size Sweep Avg Latency (less is better)
Tokio uses `Arc<Vec<u8>>` here as a separate shared-bytes reference point, not the default apples-to-apples byte path.
```textsize | tokio | bun | node | deno------+----------+----------+----------+----------8 B | 80.76 us | 70.31 us | 86.19 us | 97.79 us16 B | 79.35 us | 60.94 us | 73.73 us | 77.46 us32 B | 81.48 us | 57.04 us | 70.26 us | 77.03 us64 B | 80.14 us | 54.44 us | 75.94 us | 78.81 us128 B | 79.89 us | 68.50 us | 82.51 us | 85.95 us256 B | 79.48 us | 50.59 us | 85.94 us | 100.10 us512 B | 79.51 us | 74.78 us | 97.23 us | 123.11 us```
## Arc Comparison Avg Ratio Vs Tokio
```textsize | bun/tokio | node/tokio | deno/tokio------+-----------+------------+-----------8 B | 0.87x | 1.07x | 1.21x16 B | 0.77x | 0.93x | 0.98x32 B | 0.70x | 0.86x | 0.95x64 B | 0.68x | 0.95x | 0.98x128 B | 0.86x | 1.03x | 1.08x256 B | 0.64x | 1.08x | 1.26x512 B | 0.94x | 1.22x | 1.55x```1 MiB payloads
Section titled “1 MiB payloads”Large payloads show a very different profile from scalar messages.
- For
large string 1 MiB, Tokio is clearly fastest atn=1(221.35 µs) andn=100(37.93 ms). Atn=10, Bun edges it slightly on average (6.01 msvs6.20 ms) and also has a near-identicalp99(8.41 msvs8.44 ms). - For
Uint8Array 1 MiB, Tokio leads at every batch size on average:272.81 µs,4.64 ms, and37.83 ms. - Node.js and Deno fall further behind once payload materialization dominates the round trip, especially on the
1 MiBcases.
# Benchmark Summary
## Sources
- tokio: `results/tokio-1773827721825.csv`- bun: `results/knitting-bun-1773827812276.csv`- node: `results/knitting-node-1773828074699.csv`- deno: `results/knitting-deno-1773827925019.csv`
## Machine Specs
- OS: Ubuntu 23.10- Kernel: 6.5.0-44-generic- Architecture: x86_64- CPU: AMD Ryzen 7 4700U with Radeon Graphics- Topology: 8 logical CPUs, 1 socket(s), 8 core(s)/socket, 1 thread(s)/core- Memory: 15.1 GiB- Swap: 4.0 GiB
## Methodology Notes
- The main string and byte benchmarks are intended to compare the same logical round trip on both sides: send payload, receive it in the worker, echo it back, receive it again on the caller, then wait for the whole batch.- In `src/main.ts`, the `string` and `Uint8Array` paths go through knitting transport in both directions. That transport materializes a fresh payload on receive, so the round trip includes payload work on both the request side and the reply side.- To keep the Tokio baseline fair, `src/main.rs` clones `String` and `Vec<u8>` on send and also clones again on the worker reply. The reply clone is intentional. Without it, Tokio would be measuring a cheaper return-path move while the JS runtimes were still paying for fresh payload materialization on the way back.- The `Arc<Vec<u8>>` sweep is intentionally separate and is not the default apples-to-apples byte benchmark. It exists as an upper-bound shared-bytes reference for small payloads. `Arc::clone` only bumps a refcount, so it is expected to be cheaper than copying bytes.- This means the default `string` and `Uint8Array` tables should be read as the fairer comparison, while the Arc section should be read as "how close does the normal transport get to shared ownership for small values?"
## Batch Avg Latency (less is better)
```textbenchmark | batch | tokio | bun | node | deno---------------------+-------+-----------+----------+----------+---------number f64 (8 bytes) | n=1 | 13.01 us | 7.35 us | 6.63 us | 21.54 usnumber f64 (8 bytes) | n=10 | 27.50 us | 13.41 us | 17.28 us | 11.97 usnumber f64 (8 bytes) | n=100 | 89.55 us | 80.61 us | 62.33 us | 63.28 uslarge string 1 MiB | n=1 | 221.35 us | 1.19 ms | 2.85 ms | 1.38 mslarge string 1 MiB | n=10 | 6.20 ms | 6.01 ms | 10.79 ms | 10.38 mslarge string 1 MiB | n=100 | 37.93 ms | 50.16 ms | 84.66 ms | 83.90 msUint8Array 1 MiB | n=1 | 272.81 us | 1.30 ms | 2.35 ms | 1.14 msUint8Array 1 MiB | n=10 | 4.64 ms | 5.27 ms | 5.22 ms | 6.36 msUint8Array 1 MiB | n=100 | 37.83 ms | 47.76 ms | 54.95 ms | 60.04 ms```
## Batch P99 Latency (less is better)
```textbenchmark | batch | tokio | bun | node | deno---------------------+-------+-----------+----------+-----------+----------number f64 (8 bytes) | n=1 | 16.85 us | 18.70 us | 26.25 us | 160.57 usnumber f64 (8 bytes) | n=10 | 40.83 us | 36.58 us | 81.26 us | 111.89 usnumber f64 (8 bytes) | n=100 | 203.57 us | 92.37 us | 314.11 us | 263.05 uslarge string 1 MiB | n=1 | 371.10 us | 3.74 ms | 3.73 ms | 2.97 mslarge string 1 MiB | n=10 | 8.44 ms | 8.41 ms | 15.75 ms | 14.45 mslarge string 1 MiB | n=100 | 40.81 ms | 61.62 ms | 105.51 ms | 106.67 msUint8Array 1 MiB | n=1 | 400.45 us | 3.03 ms | 5.58 ms | 5.81 msUint8Array 1 MiB | n=10 | 8.41 ms | 7.80 ms | 9.52 ms | 14.37 msUint8Array 1 MiB | n=100 | 43.71 ms | 59.77 ms | 72.39 ms | 81.71 ms```
## Avg Ratio Vs Tokio
```textbenchmark | batch | bun/tokio | node/tokio | deno/tokio---------------------+-------+-----------+------------+-----------number f64 (8 bytes) | n=1 | 0.56x | 0.51x | 1.66xnumber f64 (8 bytes) | n=10 | 0.49x | 0.63x | 0.44xnumber f64 (8 bytes) | n=100 | 0.90x | 0.70x | 0.71xlarge string 1 MiB | n=1 | 5.36x | 12.86x | 6.22xlarge string 1 MiB | n=10 | 0.97x | 1.74x | 1.68xlarge string 1 MiB | n=100 | 1.32x | 2.23x | 2.21xUint8Array 1 MiB | n=1 | 4.75x | 8.62x | 4.18xUint8Array 1 MiB | n=10 | 1.13x | 1.12x | 1.37xUint8Array 1 MiB | n=100 | 1.26x | 1.45x | 1.59x```
## Uint8Array Size Sweep Avg Latency (less is better)
```textsize | tokio | bun | node | deno--------+-----------+-----------+-----------+----------8 B | 82.99 us | 62.80 us | 88.20 us | 107.01 us16 B | 81.91 us | 56.24 us | 65.37 us | 95.78 us32 B | 85.70 us | 49.48 us | 65.76 us | 85.05 us64 B | 76.98 us | 42.68 us | 66.88 us | 78.27 us128 B | 92.53 us | 53.53 us | 79.28 us | 84.39 us256 B | 99.70 us | 63.42 us | 83.89 us | 100.44 us512 B | 86.67 us | 68.55 us | 97.07 us | 118.03 us1 KiB | 101.42 us | 171.09 us | 157.61 us | 169.50 us2 KiB | 191.25 us | 194.62 us | 220.68 us | 233.39 us4 KiB | 195.56 us | 260.16 us | 324.39 us | 391.43 us8 KiB | 208.84 us | 397.05 us | 465.89 us | 539.98 us16 KiB | 279.25 us | 649.18 us | 741.47 us | 899.81 us32 KiB | 1.48 ms | 1.14 ms | 1.27 ms | 1.41 ms64 KiB | 2.71 ms | 2.38 ms | 2.49 ms | 2.89 ms128 KiB | 5.66 ms | 5.02 ms | 5.11 ms | 6.08 ms256 KiB | 11.92 ms | 10.56 ms | 10.11 ms | 11.96 ms512 KiB | 23.06 ms | 22.33 ms | 22.66 ms | 25.26 ms1 MiB | 29.48 ms | 46.97 ms | 52.53 ms | 55.77 ms```
## Arc Comparison Size Sweep Avg Latency (less is better)
Tokio uses `Arc<Vec<u8>>` here as a separate shared-bytes reference point, not the default apples-to-apples byte path.
```textsize | tokio | bun | node | deno------+----------+----------+----------+----------8 B | 80.76 us | 70.31 us | 86.19 us | 97.79 us16 B | 79.35 us | 60.94 us | 73.73 us | 77.46 us32 B | 81.48 us | 57.04 us | 70.26 us | 77.03 us64 B | 80.14 us | 54.44 us | 75.94 us | 78.81 us128 B | 79.89 us | 68.50 us | 82.51 us | 85.95 us256 B | 79.48 us | 50.59 us | 85.94 us | 100.10 us512 B | 79.51 us | 74.78 us | 97.23 us | 123.11 us```
## Arc Comparison Avg Ratio Vs Tokio
```textsize | bun/tokio | node/tokio | deno/tokio------+-----------+------------+-----------8 B | 0.87x | 1.07x | 1.21x16 B | 0.77x | 0.93x | 0.98x32 B | 0.70x | 0.86x | 0.95x64 B | 0.68x | 0.95x | 0.98x128 B | 0.86x | 1.03x | 1.08x256 B | 0.64x | 1.08x | 1.26x512 B | 0.94x | 1.22x | 1.55x```Uint8Array size sweep
Section titled “Uint8Array size sweep”This sweep fixes batch=100 and scales binary payload size from 8 B to 1 MiB:
- Bun is fastest from
8 Bthrough512 B, so the default copy-based byte path is already competitive at the tiny end. - Tokio retakes the lead from
1 KiBthrough16 KiBin this run. - Bun and Node.js pull slightly ahead again from
32 KiBthrough512 KiB. - At
1 MiB, Tokio is fastest again at29.48 ms, ahead of Bun (46.97 ms), Node.js (52.53 ms), and Deno (55.77 ms).
# Benchmark Summary
## Sources
- tokio: `results/tokio-1773827721825.csv`- bun: `results/knitting-bun-1773827812276.csv`- node: `results/knitting-node-1773828074699.csv`- deno: `results/knitting-deno-1773827925019.csv`
## Machine Specs
- OS: Ubuntu 23.10- Kernel: 6.5.0-44-generic- Architecture: x86_64- CPU: AMD Ryzen 7 4700U with Radeon Graphics- Topology: 8 logical CPUs, 1 socket(s), 8 core(s)/socket, 1 thread(s)/core- Memory: 15.1 GiB- Swap: 4.0 GiB
## Methodology Notes
- The main string and byte benchmarks are intended to compare the same logical round trip on both sides: send payload, receive it in the worker, echo it back, receive it again on the caller, then wait for the whole batch.- In `src/main.ts`, the `string` and `Uint8Array` paths go through knitting transport in both directions. That transport materializes a fresh payload on receive, so the round trip includes payload work on both the request side and the reply side.- To keep the Tokio baseline fair, `src/main.rs` clones `String` and `Vec<u8>` on send and also clones again on the worker reply. The reply clone is intentional. Without it, Tokio would be measuring a cheaper return-path move while the JS runtimes were still paying for fresh payload materialization on the way back.- The `Arc<Vec<u8>>` sweep is intentionally separate and is not the default apples-to-apples byte benchmark. It exists as an upper-bound shared-bytes reference for small payloads. `Arc::clone` only bumps a refcount, so it is expected to be cheaper than copying bytes.- This means the default `string` and `Uint8Array` tables should be read as the fairer comparison, while the Arc section should be read as "how close does the normal transport get to shared ownership for small values?"
## Batch Avg Latency (less is better)
```textbenchmark | batch | tokio | bun | node | deno---------------------+-------+-----------+----------+----------+---------number f64 (8 bytes) | n=1 | 13.01 us | 7.35 us | 6.63 us | 21.54 usnumber f64 (8 bytes) | n=10 | 27.50 us | 13.41 us | 17.28 us | 11.97 usnumber f64 (8 bytes) | n=100 | 89.55 us | 80.61 us | 62.33 us | 63.28 uslarge string 1 MiB | n=1 | 221.35 us | 1.19 ms | 2.85 ms | 1.38 mslarge string 1 MiB | n=10 | 6.20 ms | 6.01 ms | 10.79 ms | 10.38 mslarge string 1 MiB | n=100 | 37.93 ms | 50.16 ms | 84.66 ms | 83.90 msUint8Array 1 MiB | n=1 | 272.81 us | 1.30 ms | 2.35 ms | 1.14 msUint8Array 1 MiB | n=10 | 4.64 ms | 5.27 ms | 5.22 ms | 6.36 msUint8Array 1 MiB | n=100 | 37.83 ms | 47.76 ms | 54.95 ms | 60.04 ms```
## Batch P99 Latency (less is better)
```textbenchmark | batch | tokio | bun | node | deno---------------------+-------+-----------+----------+-----------+----------number f64 (8 bytes) | n=1 | 16.85 us | 18.70 us | 26.25 us | 160.57 usnumber f64 (8 bytes) | n=10 | 40.83 us | 36.58 us | 81.26 us | 111.89 usnumber f64 (8 bytes) | n=100 | 203.57 us | 92.37 us | 314.11 us | 263.05 uslarge string 1 MiB | n=1 | 371.10 us | 3.74 ms | 3.73 ms | 2.97 mslarge string 1 MiB | n=10 | 8.44 ms | 8.41 ms | 15.75 ms | 14.45 mslarge string 1 MiB | n=100 | 40.81 ms | 61.62 ms | 105.51 ms | 106.67 msUint8Array 1 MiB | n=1 | 400.45 us | 3.03 ms | 5.58 ms | 5.81 msUint8Array 1 MiB | n=10 | 8.41 ms | 7.80 ms | 9.52 ms | 14.37 msUint8Array 1 MiB | n=100 | 43.71 ms | 59.77 ms | 72.39 ms | 81.71 ms```
## Avg Ratio Vs Tokio
```textbenchmark | batch | bun/tokio | node/tokio | deno/tokio---------------------+-------+-----------+------------+-----------number f64 (8 bytes) | n=1 | 0.56x | 0.51x | 1.66xnumber f64 (8 bytes) | n=10 | 0.49x | 0.63x | 0.44xnumber f64 (8 bytes) | n=100 | 0.90x | 0.70x | 0.71xlarge string 1 MiB | n=1 | 5.36x | 12.86x | 6.22xlarge string 1 MiB | n=10 | 0.97x | 1.74x | 1.68xlarge string 1 MiB | n=100 | 1.32x | 2.23x | 2.21xUint8Array 1 MiB | n=1 | 4.75x | 8.62x | 4.18xUint8Array 1 MiB | n=10 | 1.13x | 1.12x | 1.37xUint8Array 1 MiB | n=100 | 1.26x | 1.45x | 1.59x```
## Uint8Array Size Sweep Avg Latency (less is better)
```textsize | tokio | bun | node | deno--------+-----------+-----------+-----------+----------8 B | 82.99 us | 62.80 us | 88.20 us | 107.01 us16 B | 81.91 us | 56.24 us | 65.37 us | 95.78 us32 B | 85.70 us | 49.48 us | 65.76 us | 85.05 us64 B | 76.98 us | 42.68 us | 66.88 us | 78.27 us128 B | 92.53 us | 53.53 us | 79.28 us | 84.39 us256 B | 99.70 us | 63.42 us | 83.89 us | 100.44 us512 B | 86.67 us | 68.55 us | 97.07 us | 118.03 us1 KiB | 101.42 us | 171.09 us | 157.61 us | 169.50 us2 KiB | 191.25 us | 194.62 us | 220.68 us | 233.39 us4 KiB | 195.56 us | 260.16 us | 324.39 us | 391.43 us8 KiB | 208.84 us | 397.05 us | 465.89 us | 539.98 us16 KiB | 279.25 us | 649.18 us | 741.47 us | 899.81 us32 KiB | 1.48 ms | 1.14 ms | 1.27 ms | 1.41 ms64 KiB | 2.71 ms | 2.38 ms | 2.49 ms | 2.89 ms128 KiB | 5.66 ms | 5.02 ms | 5.11 ms | 6.08 ms256 KiB | 11.92 ms | 10.56 ms | 10.11 ms | 11.96 ms512 KiB | 23.06 ms | 22.33 ms | 22.66 ms | 25.26 ms1 MiB | 29.48 ms | 46.97 ms | 52.53 ms | 55.77 ms```
## Arc Comparison Size Sweep Avg Latency (less is better)
Tokio uses `Arc<Vec<u8>>` here as a separate shared-bytes reference point, not the default apples-to-apples byte path.
```textsize | tokio | bun | node | deno------+----------+----------+----------+----------8 B | 80.76 us | 70.31 us | 86.19 us | 97.79 us16 B | 79.35 us | 60.94 us | 73.73 us | 77.46 us32 B | 81.48 us | 57.04 us | 70.26 us | 77.03 us64 B | 80.14 us | 54.44 us | 75.94 us | 78.81 us128 B | 79.89 us | 68.50 us | 82.51 us | 85.95 us256 B | 79.48 us | 50.59 us | 85.94 us | 100.10 us512 B | 79.51 us | 74.78 us | 97.23 us | 123.11 us```
## Arc Comparison Avg Ratio Vs Tokio
```textsize | bun/tokio | node/tokio | deno/tokio------+-----------+------------+-----------8 B | 0.87x | 1.07x | 1.21x16 B | 0.77x | 0.93x | 0.98x32 B | 0.70x | 0.86x | 0.95x64 B | 0.68x | 0.95x | 0.98x128 B | 0.86x | 1.03x | 1.08x256 B | 0.64x | 1.08x | 1.26x512 B | 0.94x | 1.22x | 1.55x```Arc<Vec<u8>> reference sweep
Section titled “Arc<Vec<u8>> reference sweep”This separate sweep also fixes batch=100, but only covers 8 B through 512 B.
On the Tokio side it uses Arc<Vec<u8>>, which is the closest thing to “magic teleportation”
in this setup: Arc::clone mostly just bumps a refcount instead of copying the bytes.
- Bun still beats the Tokio
Arcpath from8 Bthrough256 B, and is still close at512 B(74.78 µsvs79.51 µs). - Node.js is faster than the
Arcbaseline from16 Bthrough64 B, and stays near parity at128 B(82.51 µsvs79.89 µs). - Deno is close in the
16-64 Bband, but falls back more clearly by256-512 B.
Treat this as an upper-bound shared-ownership reference, not the default apples-to-apples byte benchmark.
The fair default comparison is still the normal Uint8Array copy path above.
# Benchmark Summary
## Sources
- tokio: `results/tokio-1773827721825.csv`- bun: `results/knitting-bun-1773827812276.csv`- node: `results/knitting-node-1773828074699.csv`- deno: `results/knitting-deno-1773827925019.csv`
## Machine Specs
- OS: Ubuntu 23.10- Kernel: 6.5.0-44-generic- Architecture: x86_64- CPU: AMD Ryzen 7 4700U with Radeon Graphics- Topology: 8 logical CPUs, 1 socket(s), 8 core(s)/socket, 1 thread(s)/core- Memory: 15.1 GiB- Swap: 4.0 GiB
## Methodology Notes
- The main string and byte benchmarks are intended to compare the same logical round trip on both sides: send payload, receive it in the worker, echo it back, receive it again on the caller, then wait for the whole batch.- In `src/main.ts`, the `string` and `Uint8Array` paths go through knitting transport in both directions. That transport materializes a fresh payload on receive, so the round trip includes payload work on both the request side and the reply side.- To keep the Tokio baseline fair, `src/main.rs` clones `String` and `Vec<u8>` on send and also clones again on the worker reply. The reply clone is intentional. Without it, Tokio would be measuring a cheaper return-path move while the JS runtimes were still paying for fresh payload materialization on the way back.- The `Arc<Vec<u8>>` sweep is intentionally separate and is not the default apples-to-apples byte benchmark. It exists as an upper-bound shared-bytes reference for small payloads. `Arc::clone` only bumps a refcount, so it is expected to be cheaper than copying bytes.- This means the default `string` and `Uint8Array` tables should be read as the fairer comparison, while the Arc section should be read as "how close does the normal transport get to shared ownership for small values?"
## Batch Avg Latency (less is better)
```textbenchmark | batch | tokio | bun | node | deno---------------------+-------+-----------+----------+----------+---------number f64 (8 bytes) | n=1 | 13.01 us | 7.35 us | 6.63 us | 21.54 usnumber f64 (8 bytes) | n=10 | 27.50 us | 13.41 us | 17.28 us | 11.97 usnumber f64 (8 bytes) | n=100 | 89.55 us | 80.61 us | 62.33 us | 63.28 uslarge string 1 MiB | n=1 | 221.35 us | 1.19 ms | 2.85 ms | 1.38 mslarge string 1 MiB | n=10 | 6.20 ms | 6.01 ms | 10.79 ms | 10.38 mslarge string 1 MiB | n=100 | 37.93 ms | 50.16 ms | 84.66 ms | 83.90 msUint8Array 1 MiB | n=1 | 272.81 us | 1.30 ms | 2.35 ms | 1.14 msUint8Array 1 MiB | n=10 | 4.64 ms | 5.27 ms | 5.22 ms | 6.36 msUint8Array 1 MiB | n=100 | 37.83 ms | 47.76 ms | 54.95 ms | 60.04 ms```
## Batch P99 Latency (less is better)
```textbenchmark | batch | tokio | bun | node | deno---------------------+-------+-----------+----------+-----------+----------number f64 (8 bytes) | n=1 | 16.85 us | 18.70 us | 26.25 us | 160.57 usnumber f64 (8 bytes) | n=10 | 40.83 us | 36.58 us | 81.26 us | 111.89 usnumber f64 (8 bytes) | n=100 | 203.57 us | 92.37 us | 314.11 us | 263.05 uslarge string 1 MiB | n=1 | 371.10 us | 3.74 ms | 3.73 ms | 2.97 mslarge string 1 MiB | n=10 | 8.44 ms | 8.41 ms | 15.75 ms | 14.45 mslarge string 1 MiB | n=100 | 40.81 ms | 61.62 ms | 105.51 ms | 106.67 msUint8Array 1 MiB | n=1 | 400.45 us | 3.03 ms | 5.58 ms | 5.81 msUint8Array 1 MiB | n=10 | 8.41 ms | 7.80 ms | 9.52 ms | 14.37 msUint8Array 1 MiB | n=100 | 43.71 ms | 59.77 ms | 72.39 ms | 81.71 ms```
## Avg Ratio Vs Tokio
```textbenchmark | batch | bun/tokio | node/tokio | deno/tokio---------------------+-------+-----------+------------+-----------number f64 (8 bytes) | n=1 | 0.56x | 0.51x | 1.66xnumber f64 (8 bytes) | n=10 | 0.49x | 0.63x | 0.44xnumber f64 (8 bytes) | n=100 | 0.90x | 0.70x | 0.71xlarge string 1 MiB | n=1 | 5.36x | 12.86x | 6.22xlarge string 1 MiB | n=10 | 0.97x | 1.74x | 1.68xlarge string 1 MiB | n=100 | 1.32x | 2.23x | 2.21xUint8Array 1 MiB | n=1 | 4.75x | 8.62x | 4.18xUint8Array 1 MiB | n=10 | 1.13x | 1.12x | 1.37xUint8Array 1 MiB | n=100 | 1.26x | 1.45x | 1.59x```
## Uint8Array Size Sweep Avg Latency (less is better)
```textsize | tokio | bun | node | deno--------+-----------+-----------+-----------+----------8 B | 82.99 us | 62.80 us | 88.20 us | 107.01 us16 B | 81.91 us | 56.24 us | 65.37 us | 95.78 us32 B | 85.70 us | 49.48 us | 65.76 us | 85.05 us64 B | 76.98 us | 42.68 us | 66.88 us | 78.27 us128 B | 92.53 us | 53.53 us | 79.28 us | 84.39 us256 B | 99.70 us | 63.42 us | 83.89 us | 100.44 us512 B | 86.67 us | 68.55 us | 97.07 us | 118.03 us1 KiB | 101.42 us | 171.09 us | 157.61 us | 169.50 us2 KiB | 191.25 us | 194.62 us | 220.68 us | 233.39 us4 KiB | 195.56 us | 260.16 us | 324.39 us | 391.43 us8 KiB | 208.84 us | 397.05 us | 465.89 us | 539.98 us16 KiB | 279.25 us | 649.18 us | 741.47 us | 899.81 us32 KiB | 1.48 ms | 1.14 ms | 1.27 ms | 1.41 ms64 KiB | 2.71 ms | 2.38 ms | 2.49 ms | 2.89 ms128 KiB | 5.66 ms | 5.02 ms | 5.11 ms | 6.08 ms256 KiB | 11.92 ms | 10.56 ms | 10.11 ms | 11.96 ms512 KiB | 23.06 ms | 22.33 ms | 22.66 ms | 25.26 ms1 MiB | 29.48 ms | 46.97 ms | 52.53 ms | 55.77 ms```
## Arc Comparison Size Sweep Avg Latency (less is better)
Tokio uses `Arc<Vec<u8>>` here as a separate shared-bytes reference point, not the default apples-to-apples byte path.
```textsize | tokio | bun | node | deno------+----------+----------+----------+----------8 B | 80.76 us | 70.31 us | 86.19 us | 97.79 us16 B | 79.35 us | 60.94 us | 73.73 us | 77.46 us32 B | 81.48 us | 57.04 us | 70.26 us | 77.03 us64 B | 80.14 us | 54.44 us | 75.94 us | 78.81 us128 B | 79.89 us | 68.50 us | 82.51 us | 85.95 us256 B | 79.48 us | 50.59 us | 85.94 us | 100.10 us512 B | 79.51 us | 74.78 us | 97.23 us | 123.11 us```
## Arc Comparison Avg Ratio Vs Tokio
```textsize | bun/tokio | node/tokio | deno/tokio------+-----------+------------+-----------8 B | 0.87x | 1.07x | 1.21x16 B | 0.77x | 0.93x | 0.98x32 B | 0.70x | 0.86x | 0.95x64 B | 0.68x | 0.95x | 0.98x128 B | 0.86x | 1.03x | 1.08x256 B | 0.64x | 1.08x | 1.26x512 B | 0.94x | 1.22x | 1.55x```Fairness and the one intentional asymmetry
Section titled “Fairness and the one intentional asymmetry”Two major sources of skew are already handled:
- Dispatch shape is aligned. Rust fans out via spawned tasks and waits with
join_all(...), matching knitting creating allpool.call.*(...)promises and awaitingPromise.all(...). - Runtime width is aligned. Knitting uses
threads: 1, and Rust uses#[tokio::main(worker_threads = 1)], so sender fan-out can’t spread across a bigger worker pool. - Round-trip work is aligned. The default
StringandUint8Arraypaths pay payload work in both directions on both sides; Tokio explicitly clones on send and clones again on the worker reply so the return path is not a cheaper move-only shortcut.
One asymmetry is kept on purpose: memory management.
Allocation model
Section titled “Allocation model”This benchmark measures “total cost of the system as designed”, not “transport cost after normalizing allocation away”. Large payloads have to be copied or shared somehow, and that choice is part of the cost.
For large string and byte payloads:
- Rust
String/Vec<u8>paysclone()(heap allocation + memcpy) in the timed section. - Knitting copies into a preallocated shared-memory region managed by its own allocator-like bookkeeping.
Avoiding general-purpose allocation in the hot path is part of what makes knitting interesting, so the benchmark keeps that cost in-bounds rather than hiding it.
The Arc<Vec<u8>> sweep is included separately for exactly that reason: it shows the shared-ownership upper bound for tiny payloads without pretending that it is the default fair byte path.
A rough cost model
Section titled “A rough cost model”For the payload-heavy echo cases, treat the benchmark as measuring two different “systems”:
- knitting: shared-buffer copies + allocator-style region management (JS values still get materialized when a worker reads/returns them)
- tokio default: clone-driven allocation + payload copies on the channel path
- tokio Arc reference:
Arc::cloneshared ownership for the byte buffer handle
The exact low-level behavior depends on payload type and runtime, but the high-level point is stable: knitting is buying speed by replacing repeated general-purpose allocation with preallocated shared-memory management.
Why knitting can be fast
Section titled “Why knitting can be fast”A few concrete things knitting does that matter for this benchmark:
- Fixed pool topology → simpler queues. The pool knows its workers up front, and each host↔worker lane is effectively single‑producer/single‑consumer. That’s cheaper than a fully general multi‑producer channel.
- Low-garbage hot path. Most transport work happens inside typed-array-backed buffers and reused task objects, reducing allocation churn and GC pressure (and references get cleared quickly after each call settles).
- Two-tier payload path. Small payloads encode inline in the per-call header slot (roughly ~0.5 KiB per in-flight call, with ~544 bytes usable for inline data); larger payloads spill into the shared payload buffer (SAB/GSAB).
- Shared payload buffer + mini allocator. Large payloads are copied into a preallocated
SharedArrayBufferand carved into 64‑byte‑aligned regions tracked by a small slot table/bitset (more complexity, lessmallocin the hot path). - Primitives are “header-only”. Numbers/booleans/null/etc encode directly in header words (no payload buffer at all), keeping contention and copying low.
- Optional “gc at idle boundaries”. When workers have
gc()available , knitting may trigger a GC before going into longer spin/park waits, nudging collections away from the hot loop.
None of this is free: it trades simplicity for careful memory layout, extra bookkeeping, and more “allocator-like” engineering. That trade is exactly what this repo is trying to make visible.