Ever heard someone say, _“If you generate enough UUIDs, you’ll eventually get a collision”_?
Technically, they’re right. Practically? Not even close.
This article dives into the math and probability theory behind UUID collisions — especially UUIDv4 — and explains why your distributed system is (almost certainly) safe.
🧬 A Quick Refresher: What Is a UUIDv4?
A UUIDv4 is a 128-bit identifier where 122 bits are random (the remaining bits are used for version and variant metadata).
That means there are:
2^122 ≈ 5.3 × 10^36
possible UUIDv4s.
For comparison:
- The number of grains of sand on Earth: ~7.5 × 10^18
- The number of stars in the observable universe: ~1 × 10^24
🎂 The Birthday Problem: A Collision Analogy
To understand UUID collisions, we use a classic example from probability theory: the birthday paradox.
The Problem:
> How many people need to be in a room before two of them are likely to share a birthday?
Answer: Just 23 people for a 50% chance.
This unintuitive result is based on comparing every pair in the group, not just matching to a fixed value.
The same logic applies to UUID collisions — every newly generated UUID is compared against all previously generated ones.
📐 The Formula: Collision Probability
Let’s apply the birthday formula to UUIDs.
Approximate Probability:
p ≈ 1 - e^(-n² / (2 × N))
Where:
n
= number of UUIDs generatedN
= total possible UUIDs (2^122
)p
= probability of at least one collision
Let’s reverse the formula to ask:
How many UUIDs can we generate before the chance of a collision hits 50%?
🔢 The Threshold: When Collisions Become "Likely"
Plugging into the math:
n ≈ sqrt(2 × N × ln(1 / (1 - p)))
For p = 0.5
and N = 2^122
, we get:
n ≈ 2.71 × 10^18 UUIDs
That’s 2.7 quintillion UUIDs before hitting just a 50% collision chance.
To generate that many:
- At 1 million UUIDs/sec → would take ~85,000 years
- At 1 billion UUIDs/sec → still takes 85 years
> Bottom line: UUIDv4 is extremely safe at global scale.
🧪 Real-World Collision Math
Let’s break down smaller scale scenarios:
Case 1: Generate 1 million UUIDs
p ≈ 1 - e^(-(1e6)^2 / (2 × 2^122)) ≈ 1.84 × 10^-29
That’s a 1 in 10 octillion chance. You’re more likely to be struck by lightning while winning the lottery.
Case 2: Generate 1 billion UUIDs
p ≈ 1.84 × 10^-23
Still absurdly low. Nothing to worry about in production.
🔍 Where the Myth Comes From
So why do developers still fear collisions?
1. Misunderstood Math
The birthday paradox is counterintuitive. People often underestimate how large 2^122
really is.
2. Truncated UUIDs
Some teams shorten UUIDs for display (e.g. first 8 characters). This reduces entropy to ~32 bits — collisions are much more likely in that case.
3. Bad PRNGs or Bugs
Poorly implemented UUID generators (especially non-cryptographic ones) can increase risk.
Always use standard libraries and CSPRNGs (os.urandom
, secrets
, etc.).
🧠 The Bigger Picture: Collision Isn’t the Real Risk
When choosing ID schemes, the real trade-offs are often:
- Storage size (UUIDs = 16 bytes vs INT = 4 bytes)
- Index performance (UUIDv4 = random inserts vs UUIDv7 = ordered)
- Traceability (UUIDv1 = timestamp + MAC; UUIDv4 = anonymous)
- Debuggability (ULID, KSUID, and UUIDv7 are more human-friendly)
Collision risk? Not even in the top five concerns.
💡 Bonus: UUIDv7 Improves Sortability
One reason some developers avoid UUIDv4 is the random ordering in indexes and logs.
UUIDv7, introduced in RFC 9562 (2024), addresses this:
- Includes a timestamp in the first 48 bits
- Still has 74+ bits of randomness
- Lexicographically sortable
It’s great for databases, logs, and systems that benefit from time-ordered IDs — without sacrificing global uniqueness.
✅ Takeaways for Developers
- UUIDv4 collisions are mathematically possible but practically irrelevant
- You can safely generate billions of UUIDv4s with zero collision concern
- If you're seeing collisions, it's likely due to:
- Shortened/truncated UUIDs
- Faulty generation (e.g., random()
instead of secrets
)
- Logic bugs (e.g., duplicate writes)
> Bottom line: It’s not the math — it’s usually the implementation.
Final Thoughts
UUIDs are a brilliant engineering solution — elegant in theory, rock-solid in practice.
Understanding the math demystifies them and lets us focus on real engineering trade-offs, not hypothetical ghosts.
So next time someone warns about UUID collisions, send them this article… and get back to shipping.
🧠 Probability theory: 1
Myth: 0