Queuing Theory Simulator
Interactive visualisation of Theory of Constraints and Queuing Theory dynamics.
Optimized for larger screens
Some simulations are best viewed on larger screens in landscape orientation, but they might work on your phone. I just don't optimise for them.
Built With
Overview
Overview
This tool shows the hidden dynamics of multi-team software delivery systems. It shows why traditional resource utilisation metrics (“keeping people busy”) often cause waste.
Models a Push System, where Stage 1 (Development) pushes work to Stage 2 (QA/Deploy) regardless of capacity. This demonstrates Theory of Constraints and Lean Flow principles.
What’s missing?
- Cost. I should probably add cost.
- Sprints to show the cadence benefits of short, focused cycles.
- Dual-track agile vs agile.
- A product team… but that would be a whole new v2 with a different set of problems.
tl;dr Quick Explanation
tl;dr Quick Explanation
The Bottleneck Problem
The default scenario shows a classic bottleneck. Development (Stage 1) has more capacity than QA (Stage 2). Watch how the QA queue grows unbounded while developers sit idle waiting for work to clear downstream. This is the Theory of Constraints in action: the system’s throughput is limited by its slowest stage, not its fastest.
What to Look For
- Queue bars growing: When the bar under a stage fills up, work is waiting. The longer items wait, the higher your Lead Time.
- Colour changes: Work items turn from green (flowing) → yellow (waiting) → red (blocked). Blocked items are stuck behind a server that’s also blocked.
- Utilisation vs Flow: High server utilisation (everyone looks busy) can coexist with terrible flow efficiency (work spends most of its time waiting).
The “Aha” Moment
Try increasing Development capacity (more devs) without increasing QA capacity. Watch the QA queue explode. Adding people upstream makes the problem worse, not better. This is why “throwing bodies at the problem” fails.
Scenarios
Scenarios
Scenarios demonstrating Queueing Theory and Theory of Constraints principles.
The Bottleneck (Push System)
Lesson: Demonstrates the Theory of Constraints. Increasing speed upstream (local optimisation) without addressing the bottleneck downstream only generates “inventory” (WIP), which is a form of waste. The QA queue will grow indefinitely, increasing lead time and defect risk.
Integration Hell (High Friction)
Lesson: Shows that flow is destroyed not just by volume, but by friction. Servers appear “busy” (high utilisation), but they are working on “failure demand” or waiting, resulting in extremely low Flow Efficiency (<15%).
The Utilisation Trap
Lesson: Demonstrates Kingman’s Formula approximation. As system utilisation approaches 100%, queue times rise exponentially towards infinity. It proves that a system running at “maximum efficiency” effectively ceases to flow.
Whale Variability
Lesson: Illustrates the damage caused by large batch sizes or inconsistent ticket sizing. One large task blocks a server for a long duration, causing smaller tasks (“Guppies”) to pile up behind it, drastically increasing the average wait time for the entire system.
The Push vs Pull Metaphor
The Push vs Pull Metaphor
Push vs. Pull is key to flow efficiency.
Push System (This Simulation)
Work is pushed downstream the moment it’s complete, regardless of whether the next stage is ready. This is how most organisations operate by default.
- Creates inventory (queues) between stages
- Hides bottlenecks until they become crises
- Optimises for local efficiency over system flow
Pull System (Lean/Kanban)
Work is pulled by downstream stages only when they have capacity. This is the Toyota Production System model.
- WIP limits prevent queue buildup
- Bottlenecks become immediately visible
- Optimises for end-to-end flow
This simulation models a Push system to demonstrate failure modes. The growing queues are “invisible inventory” often missed by traditional metrics.
Visual Guide
Visual Guide
Guide to visual elements.
Work Item Colours
Queue Bars
The horizontal bar under each stage shows queue depth. When it fills up, work is piling up faster than it can be processed. A full bar indicates a bottleneck.
Servers (Circles)
Each circle represents a “server” (developer, tester, or machine). When a server is processing work, it’s filled. When idle, it’s empty. Multiple servers process in parallel.
Metrics Dashboard
- Throughput: Items completed per time unit
- Lead Time: Total time from arrival to completion
- Wait Time: Time spent in queues (not being worked on)
- Utilisation: Percentage of time servers are busy
- Flow Efficiency: Work time ÷ Lead time (how much time is “value-add”)
Controls Reference
Controls Reference
Arrival Rate (λ)
How frequently new work items enter the system. Higher values simulate high-demand periods or sprints with aggressive commitments.
Dev Capacity (Servers)
Number of parallel workers in Stage 1 (Development). More servers = more throughput capacity, but only if downstream can absorb it.
QA Capacity (Servers)
Number of parallel workers in Stage 2 (QA/Deploy). Often the bottleneck in real organisations due to specialisation and handoff friction.
Service Rate (μ)
How quickly each server processes work. Represents team velocity, tooling efficiency, or automation level.
Blocking %
Probability that a completed item blocks downstream (e.g., integration failures, environment issues, merge conflicts). Creates cascading delays.
Rework %
Probability that a completed item is rejected and sent back to the start. Represents bugs found in QA, failed code reviews, or requirement misunderstandings.
Whale Mode
When enabled, a percentage of work items arrive as “whales”: tasks that take 5x longer than normal. This simulates:
- Large, poorly-scoped tickets
- Unexpected complexity (“iceberg” stories)
- Batch processing or release trains
Human Factors
Human Factors
Abstracts human behaviour, mapping directly to real organisational patterns.
Blocking as Blame Culture
High blocking percentages often correlate with poor psychological safety. When teams fear blame for integration failures, they add more review gates and approval steps, which increases blocking probability. The simulation shows why “more process” often makes things worse.
Rework as Technical Debt
Rework represents the “hidden factory”: capacity consumed by fixing defects that should have been caught earlier. High rework rates indicate poor requirements, insufficient testing, or rush-to-deploy pressure.
The Utilisation Trap
Management often optimises for utilisation (keeping people busy), but the simulation shows this destroys flow. At 90%+ utilisation, queue times explode exponentially. The counterintuitive truth: slack capacity is essential for flow.
Handoff Friction
Each stage boundary represents a handoff: context lost, waiting for availability, re-explanation. The simulation’s two-stage model is a simplification; real organisations often have 5-10+ handoffs, each multiplying delay and error probability.
Mathematical Foundations
Mathematical Foundations
Based on Queueing Theory and Lean management principles. These mathematical laws govern flow systems, whether physical manufacturing lines or digital software development. They explain why “common sense” management (like 100% utilisation) often fails mathematically.
M/M/c Queue Model
The simulation relies on Kendall’s notation for queuing nodes:
- M (Markovian Arrivals): Arrival times follow a Poisson process (Exponential distribution).
- M (Markovian Service): Service times follow an Exponential distribution.
- c (Servers): The number of active servers (developers/testers) processing the queue.
Exponential Distribution
- Origin: Probability theory; fundamental to Poisson point processes.
- Concept: Describes the time between events that occur continuously and independently at a constant average rate. It models real-world variance where most tasks take an average time, but some are very quick and others take much longer.
In Simulation: Used to generate randomized arrival times for new work items and the duration of service (work) for each server, ensuring the simulation reflects realistic variability rather than static averages.
// Generate random values following an Exponential distribution
// Used for arrival times (Poisson process) and service durations
const getExponential = (rate: number) => {
return -Math.log(1 - Math.random()) / rate;
}
// Generate next arrival time
data.nextArrivalTime = getExponential(params.arrivalRate) * 1000;Little’s Law
- Origin: John Little (1961), Operations Research.
Concept: The long-term average number of items in a stationary system () is equal to the long-term average effective arrival rate () multiplied by the average time an item spends in the system ().
In Simulation: Used to validate the statistical metrics displayed in the dashboard. It proves that if you cannot change the arrival rate (), the only way to reduce Lead Time () is to reduce the Work in Progress ().
// Verify simulator metrics against Little's Law (L = λ * W)
// avgSystemLength (L) ≈ arrivalRate (λ) * avgSystemTime (W)
const stats = {
avgSystemTime: s1.W + s2.W, // Total time in system (W)
avgSystemLength: s1.L + s2.L // Total items in system (L)
};Erlang-C Formula
Origin: A.K. Erlang (1917), Telecommunications traffic engineering.
Concept: Calculates the probability that a randomly arriving item must wait in the queue rather than being served immediately, given the traffic intensity and number of servers.
In Simulation: Provides the “Theoretical” baseline for wait times. Comparison between this theoretical baseline and the “Actual” simulation results highlights the extra friction caused by factors Erlang-C doesn’t account for, such as blocking and rework.
// Calculate Erlang-C probability that an item must wait
const calcErlangC = (lambda: number, mu: number, c: number) => {
const rho = lambda / (c * mu); // Traffic intensity
// Calculate sum of geometric series for P(0)
let p0_inv = 0;
for (let i = 0; i < c; i++) {
p0_inv += Math.pow(lambda / mu, i) / factorial(i);
}
p0_inv += (Math.pow(lambda / mu, c) / factorial(c)) * (1 / (1 - rho));
// Theoretical average queue length (Lq)
return (Math.pow(lambda / mu, c) * rho * (1 / p0_inv)) /
(factorial(c) * Math.pow(1 - rho, 2));
};Flow Efficiency
Origin: Lean Manufacturing / Toyota Production System.
Concept: A metric measuring the percentage of time work is actually being progressed versus sitting idle. In knowledge work, this is often shockingly low (10-15%).
In Simulation: Calculated dynamically by tracking the exact milliseconds an entity spends being “serviced” (Active Work) versus “waiting” or “blocked”. It visualises how high server utilisation can coexist with terrible flow efficiency.
// Track value-added time vs total lead time for every item
if (Math.random() < 0.1) { // 10% sampling rate for UI updates
const flowEff = data.metrics.totalSystem > 0
? (data.metrics.totalValueAdded / data.metrics.totalSystem) * 100
: 0;
setSimState(prev => ({
...prev,
flowEfficiency: flowEff
}));
}