Skip to content

Parallel Monte Carlo pi

The classic Monte Carlo dartboard: throw random points into a square, count how many land inside the unit circle, and estimate pi. Every point is independent, making this embarrassingly parallel and a clean demonstration of Knitting’s map-reduce pattern.

Throw lots of random points into [1,1]×[1,1][-1,1]\times[-1,1]. A point is “inside” if x2+y21x^2 + y^2 \le 1. Because areas scale nicely:

π4insidetotal\pi \approx 4 \cdot \frac{\text{inside}}{\text{total}}

The host divides the total sample count into chunks, dispatches each chunk to a worker via piChunk(seed, samples), and each worker runs a tight inner loop with a fast deterministic RNG (xorshift32). Each chunk returns a tiny summary: { inside, samples }. The host aggregates and prints the final estimate.

deno.sh
deno add --npm jsr:@vixeny/knitting
bun.sh
bun src/pi.ts --threads 6 --samples 50000000 --chunk 1000000

Expected output:

threads: 6 samples: 50,000,000 chunk: 1,000,000
dispatching 50 chunks...
pi ~ 3.14159842 (error: +0.00000577)
elapsed: 0.87s
pi.ts
import { createPool, isMain } from "@vixeny/knitting";
import { piChunk } from "./montecarlo_pi.ts";
function intArg(name: string, fallback: number) {
const idx = process.argv.indexOf(`--${name}`);
if (idx !== -1 && idx + 1 < process.argv.length) {
const v = Number(process.argv[idx + 1]);
if (Number.isFinite(v) && v > 0) return Math.floor(v);
}
return fallback;
}
// Tunables (pick any numbers you like)
const TOTAL_SAMPLES = intArg("samples", 50_000_000_000);
const CHUNK_SAMPLES = intArg("chunk", 10_000_000);
const THREADS = intArg("threads", 6);
const { call, shutdown } = createPool({
threads: THREADS,
inliner: {
position: "last",
batchSize: 8,
},
balancer: "firstIdle",
})({ piChunk });
async function main() {
const jobCount = Math.ceil(TOTAL_SAMPLES / CHUNK_SAMPLES);
const jobs = new Array<Promise<{ inside: number; samples: number }>>(
jobCount,
);
// Seed base: stable-ish, different each run
const seedBase = ((Date.now() | 0) ^ 0x9e3779b9) | 0;
// Queue one worker task per chunk.
for (let i = 0; i < jobCount; i++) {
const remaining = TOTAL_SAMPLES - i * CHUNK_SAMPLES;
const samples = remaining >= CHUNK_SAMPLES ? CHUNK_SAMPLES : remaining;
// Spread seeds so chunks don’t reuse the same random stream
const seed = (seedBase + (i * 0x6d2b79f5)) | 0;
jobs[i] = call.piChunk([seed, samples]);
}
const time = performance.now();
const results = await Promise.all(jobs);
const finished = performance.now();
let inside = 0;
let total = 0;
for (const r of results) {
inside += r.inside;
total += r.samples;
}
const pi = (4 * inside) / total;
// Quick sanity: expected sampling error scales like ~1/sqrt(N)
const approxStdErr = 1 / Math.sqrt(total);
console.log("Monte Carlo π estimate");
console.log("threads :", THREADS + 1);
console.log("total samples:", total.toLocaleString());
console.log("chunk size :", CHUNK_SAMPLES.toLocaleString());
console.log("pi :", pi);
console.log("took :", (finished - time).toFixed(3), " ms");
console.log("rough ±err :", `~${(approxStdErr * 4).toExponential(2)}`);
}
if (isMain) {
main().finally(shutdown);
}

This example captures the core scientific computing pattern: map (simulate many independent trials) then reduce (combine partial statistics). Each chunk returns a small summary, not raw data — that’s critical for keeping transfer overhead low.

The same structure works for numerical integration, uncertainty propagation, risk estimation, statistical physics, and any problem you can phrase as “run many trials and combine results.”

Chunk size matters. Each task should do enough work to justify dispatch overhead. Start with 100k-5M iterations per chunk depending on your loop cost. Too small and overhead dominates; too big and you lose load balancing.

Seed carefully. Use one base seed (so runs are reproducible) and derive a different per-chunk seed (so chunks don’t share the same random stream). This example does this correctly.

Keep the inner loop tight. Avoid allocations per iteration. This example uses xorshift32 instead of Math.random() for speed.

  1. Increase --samples to 500M and watch the estimate stabilize.
  2. Try different --chunk sizes and measure throughput — there’s a sweet spot.
  3. Fix the seed and confirm identical output across runs.
  4. Modify the worker to also return timing per chunk and plot a histogram.