Parallel Monte Carlo pi

The classic Monte Carlo dartboard: throw random points into a square, count how many land inside the unit circle, and estimate pi. Every point is independent, making this embarrassingly parallel and a clean demonstration of Knitting’s map-reduce pattern.

How it works

Throw lots of random points into $[-1,1]\times[-1,1]$ . A point is “inside” if $x^2 + y^2 \le 1$ . Because areas scale nicely:

\pi \approx 4 \cdot \frac{\text{inside}}{\text{total}}

The host divides the total sample count into chunks, dispatches each chunk to a worker via piChunk(seed, samples), and each worker runs a tight inner loop with a fast deterministic RNG (xorshift32). Each chunk returns a tiny summary: { inside, samples }. The host aggregates and prints the final estimate.

Install

deno add --npm jsr:@vixeny/knitting

npx jsr add @vixeny/knitting

# pnpm 10.9+
pnpm add jsr:@vixeny/knitting

# fallback (older pnpm)
pnpm dlx jsr add @vixeny/knitting

# yarn 4.9+
yarn add jsr:@vixeny/knitting

# fallback (older yarn)
yarn dlx jsr add @vixeny/knitting

bunx jsr add @vixeny/knitting

Run

bun src/pi.ts --threads 6 --samples 50000000 --chunk 1000000

deno run -A src/pi.ts --threads 6 --samples 50000000 --chunk 1000000

npx tsx src/pi.ts --threads 6 --samples 50000000 --chunk 1000000

Expected output:

threads: 6  samples: 50,000,000  chunk: 1,000,000
dispatching 50 chunks...

pi ~ 3.14159842  (error: +0.00000577)
elapsed: 0.87s

import { createPool, isMain } from "@vixeny/knitting";
import { piChunk } from "./montecarlo_pi.ts";

function intArg(name: string, fallback: number) {
  const idx = process.argv.indexOf(`--${name}`);
  if (idx !== -1 && idx + 1 < process.argv.length) {
    const v = Number(process.argv[idx + 1]);
    if (Number.isFinite(v) && v > 0) return Math.floor(v);
  }
  return fallback;
}

// Tunables (pick any numbers you like)
const TOTAL_SAMPLES = intArg("samples", 50_000_000_000);
const CHUNK_SAMPLES = intArg("chunk", 10_000_000);
const THREADS = intArg("threads", 6);

const { call, shutdown } = createPool({
  threads: THREADS,
  inliner: {
    position: "last",
    batchSize: 8,
  },
  balancer: "firstIdle",
})({ piChunk });

async function main() {
  const jobCount = Math.ceil(TOTAL_SAMPLES / CHUNK_SAMPLES);
  const jobs = new Array<Promise<{ inside: number; samples: number }>>(
    jobCount,
  );

  // Seed base: stable-ish, different each run
  const seedBase = ((Date.now() | 0) ^ 0x9e3779b9) | 0;

  // Queue one worker task per chunk.
  for (let i = 0; i < jobCount; i++) {
    const remaining = TOTAL_SAMPLES - i * CHUNK_SAMPLES;
    const samples = remaining >= CHUNK_SAMPLES ? CHUNK_SAMPLES : remaining;

    // Spread seeds so chunks don’t reuse the same random stream
    const seed = (seedBase + (i * 0x6d2b79f5)) | 0;

    jobs[i] = call.piChunk([seed, samples]);
  }

  const time = performance.now();
  const results = await Promise.all(jobs);
  const finished = performance.now();

  let inside = 0;
  let total = 0;
  for (const r of results) {
    inside += r.inside;
    total += r.samples;
  }

  const pi = (4 * inside) / total;

  // Quick sanity: expected sampling error scales like ~1/sqrt(N)
  const approxStdErr = 1 / Math.sqrt(total);

  console.log("Monte Carlo π estimate");
  console.log("threads      :", THREADS + 1);
  console.log("total samples:", total.toLocaleString());
  console.log("chunk size   :", CHUNK_SAMPLES.toLocaleString());
  console.log("pi           :", pi);
  console.log("took         :", (finished - time).toFixed(3), " ms");
  console.log("rough ±err   :", `~${(approxStdErr * 4).toExponential(2)}`);
}

if (isMain) {
  main().finally(shutdown);
}

import { task } from "@vixeny/knitting";

type ChunkArgs = readonly [seed: number, samples: number];
type ChunkResult = { inside: number; samples: number };

/**
 * Fast, deterministic RNG: xorshift32.
 * (Good enough for Monte Carlo demos, and much faster than Math.random in tight loops.)
 */
function xorshift32(state: number): number {
  state |= 0;
  state ^= state << 13;
  state ^= state >>> 17;
  state ^= state << 5;
  return state | 0;
}

const INV_2_POW_32 = 2.3283064365386963e-10; // 1 / 2^32

export const piChunk = task<ChunkArgs, ChunkResult>({
  f: ([seed, samples]) => {
    let s = seed | 0;
    let inside = 0;

    for (let i = 0; i < samples; i++) {
      s = xorshift32(s);
      const x = ((s >>> 0) * INV_2_POW_32) * 2 - 1;

      s = xorshift32(s);
      const y = ((s >>> 0) * INV_2_POW_32) * 2 - 1;

      const r2 = x * x + y * y;
      if (r2 <= 1) inside++;
    }

    return { inside, samples };
  },
});

Why this is a good parallel pattern

This example captures the core scientific computing pattern: map (simulate many independent trials) then reduce (combine partial statistics). Each chunk returns a small summary, not raw data — that’s critical for keeping transfer overhead low.

The same structure works for numerical integration, uncertainty propagation, risk estimation, statistical physics, and any problem you can phrase as “run many trials and combine results.”

Practical notes

Chunk size matters. Each task should do enough work to justify dispatch overhead. Start with 100k-5M iterations per chunk depending on your loop cost. Too small and overhead dominates; too big and you lose load balancing.

Seed carefully. Use one base seed (so runs are reproducible) and derive a different per-chunk seed (so chunks don’t share the same random stream). This example does this correctly.

Keep the inner loop tight. Avoid allocations per iteration. This example uses xorshift32 instead of Math.random() for speed.

Things to try

Increase --samples to 500M and watch the estimate stabilize.
Try different --chunk sizes and measure throughput — there’s a sweet spot.
Fix the seed and confirm identical output across runs.
Modify the worker to also return timing per chunk and plot a histogram.