Parallel Monte Carlo pi
The classic Monte Carlo dartboard: throw random points into a square, count how many land inside the unit circle, and estimate pi. Every point is independent, making this embarrassingly parallel and a clean demonstration of Knitting’s map-reduce pattern.
How it works
Section titled “How it works”Throw lots of random points into . A point is “inside” if . Because areas scale nicely:
The host divides the total sample count into chunks, dispatches each chunk to a worker via piChunk(seed, samples), and each worker runs a tight inner loop with a fast deterministic RNG (xorshift32). Each chunk returns a tiny summary: { inside, samples }. The host aggregates and prints the final estimate.
Install
Section titled “Install”deno add --npm jsr:@vixeny/knittingnpx jsr add @vixeny/knitting# pnpm 10.9+pnpm add jsr:@vixeny/knitting
# fallback (older pnpm)pnpm dlx jsr add @vixeny/knitting# yarn 4.9+yarn add jsr:@vixeny/knitting
# fallback (older yarn)yarn dlx jsr add @vixeny/knittingbunx jsr add @vixeny/knittingbun src/pi.ts --threads 6 --samples 50000000 --chunk 1000000deno run -A src/pi.ts --threads 6 --samples 50000000 --chunk 1000000npx tsx src/pi.ts --threads 6 --samples 50000000 --chunk 1000000Expected output:
threads: 6 samples: 50,000,000 chunk: 1,000,000dispatching 50 chunks...
pi ~ 3.14159842 (error: +0.00000577)elapsed: 0.87simport { createPool, isMain } from "@vixeny/knitting";import { piChunk } from "./montecarlo_pi.ts";
function intArg(name: string, fallback: number) { const idx = process.argv.indexOf(`--${name}`); if (idx !== -1 && idx + 1 < process.argv.length) { const v = Number(process.argv[idx + 1]); if (Number.isFinite(v) && v > 0) return Math.floor(v); } return fallback;}
// Tunables (pick any numbers you like)const TOTAL_SAMPLES = intArg("samples", 50_000_000_000);const CHUNK_SAMPLES = intArg("chunk", 10_000_000);const THREADS = intArg("threads", 6);
const { call, shutdown } = createPool({ threads: THREADS, inliner: { position: "last", batchSize: 8, }, balancer: "firstIdle",})({ piChunk });
async function main() { const jobCount = Math.ceil(TOTAL_SAMPLES / CHUNK_SAMPLES); const jobs = new Array<Promise<{ inside: number; samples: number }>>( jobCount, );
// Seed base: stable-ish, different each run const seedBase = ((Date.now() | 0) ^ 0x9e3779b9) | 0;
// Queue one worker task per chunk. for (let i = 0; i < jobCount; i++) { const remaining = TOTAL_SAMPLES - i * CHUNK_SAMPLES; const samples = remaining >= CHUNK_SAMPLES ? CHUNK_SAMPLES : remaining;
// Spread seeds so chunks don’t reuse the same random stream const seed = (seedBase + (i * 0x6d2b79f5)) | 0;
jobs[i] = call.piChunk([seed, samples]); }
const time = performance.now(); const results = await Promise.all(jobs); const finished = performance.now();
let inside = 0; let total = 0; for (const r of results) { inside += r.inside; total += r.samples; }
const pi = (4 * inside) / total;
// Quick sanity: expected sampling error scales like ~1/sqrt(N) const approxStdErr = 1 / Math.sqrt(total);
console.log("Monte Carlo π estimate"); console.log("threads :", THREADS + 1); console.log("total samples:", total.toLocaleString()); console.log("chunk size :", CHUNK_SAMPLES.toLocaleString()); console.log("pi :", pi); console.log("took :", (finished - time).toFixed(3), " ms"); console.log("rough ±err :", `~${(approxStdErr * 4).toExponential(2)}`);}
if (isMain) { main().finally(shutdown);}import { task } from "@vixeny/knitting";
type ChunkArgs = readonly [seed: number, samples: number];type ChunkResult = { inside: number; samples: number };
/** * Fast, deterministic RNG: xorshift32. * (Good enough for Monte Carlo demos, and much faster than Math.random in tight loops.) */function xorshift32(state: number): number { state |= 0; state ^= state << 13; state ^= state >>> 17; state ^= state << 5; return state | 0;}
const INV_2_POW_32 = 2.3283064365386963e-10; // 1 / 2^32
export const piChunk = task<ChunkArgs, ChunkResult>({ f: ([seed, samples]) => { let s = seed | 0; let inside = 0;
for (let i = 0; i < samples; i++) { s = xorshift32(s); const x = ((s >>> 0) * INV_2_POW_32) * 2 - 1;
s = xorshift32(s); const y = ((s >>> 0) * INV_2_POW_32) * 2 - 1;
const r2 = x * x + y * y; if (r2 <= 1) inside++; }
return { inside, samples }; },});Why this is a good parallel pattern
Section titled “Why this is a good parallel pattern”This example captures the core scientific computing pattern: map (simulate many independent trials) then reduce (combine partial statistics). Each chunk returns a small summary, not raw data — that’s critical for keeping transfer overhead low.
The same structure works for numerical integration, uncertainty propagation, risk estimation, statistical physics, and any problem you can phrase as “run many trials and combine results.”
Practical notes
Section titled “Practical notes”Chunk size matters. Each task should do enough work to justify dispatch overhead. Start with 100k-5M iterations per chunk depending on your loop cost. Too small and overhead dominates; too big and you lose load balancing.
Seed carefully. Use one base seed (so runs are reproducible) and derive a different per-chunk seed (so chunks don’t share the same random stream). This example does this correctly.
Keep the inner loop tight. Avoid allocations per iteration. This example uses xorshift32 instead of Math.random() for speed.
Things to try
Section titled “Things to try”- Increase
--samplesto 500M and watch the estimate stabilize. - Try different
--chunksizes and measure throughput — there’s a sweet spot. - Fix the seed and confirm identical output across runs.
- Modify the worker to also return timing per chunk and plot a histogram.