🎯 The Premise
_"I generated 100 million UUIDs and got 3 duplicates."_
This kind of claim pops up in dev forums every few months — sometimes as a warning, sometimes as a humblebrag, always as a conversation starter.
So, we decided to actually test it.
🧪 The Experiment Setup
We generated 100 million UUIDv4 identifiers using Python, Go, and Node.js — three popular environments that rely on cryptographically secure random number generators.
Generation Code (Python)
import uuid
seen = set()
collisions = 0
for _ in range(100_000_000):
u = uuid.uuid4()
if u in seen:
collisions += 1
seen.add(u)
print(f"Collisions found: {collisions}")
> Note: The final run was batched and parallelized using multiprocessing and memory-mapped sets for performance.
🧾 The Result: Zero Collisions
No duplicates were found. Zilch. Nada.
The myth? Busted. At least in every properly executed run.
🧠 Wait... Where Did the “3 Collisions” Come From?
We dug into the most common causes behind these misleading claims:
1. 🧹 Truncated UUIDs
Some systems shorten UUIDs (e.g. keep only the first 8 characters for brevity).
Example:
short = str(uuid.uuid4())[:8]
This drastically reduces the keyspace from 2¹²² to 2³² — making collisions probable, not rare.
2. 🧮 String vs Binary Comparisons
Comparing UUIDs as strings vs bytes can introduce subtle bugs.
str(uuid.uuid4()) != uuid.uuid4().bytes
Normalize format before checking for equality.
3. 🐞 Faulty Randomness
If you're using non-cryptographic RNGs (Math.random()
, or insecure PRNGs), UUIDv4 loses its guarantees.
Always use a CSPRNG (os.urandom
, crypto.randomUUID
, etc.)
4. 🧍 Parallelism Bugs
Multiple threads or processes writing to a shared file or memory structure without synchronization can create phantom duplicates.
Use locks or atomic operations for integrity in high-throughput generation.
🎲 The Real Odds of UUIDv4 Collisions
UUIDv4 has 122 bits of randomness. The total number of possible UUIDs is:
2^122 ≈ 5.3 x 10^36
Using the birthday paradox, the expected number of UUIDs needed to hit a 50% chance of collision is:
√(π/2 * 2^122) ≈ 2.71 x 10^18 UUIDs
That's 2.7 quintillion UUIDs — far beyond any real-world application.
For comparison:
- 1 billion UUIDs/day = ~7,400 years to reach 50% collision odds
- Generating 100 million UUIDs is barely a ripple in the pool
🧬 UUID Collisions in Real Systems
Real-world systems using UUIDv4:
- PostgreSQL primary keys
- DynamoDB partition keys
- Kubernetes object UIDs
- Git object hashes (not UUIDs, but same uniqueness principle)
To date, no production-grade UUIDv4 collision has ever been publicly verified — unless caused by system misconfiguration or misuse.
🔐 Takeaways for Developers
✅ Use UUIDv4 When:
- You need fast, globally unique IDs
- You're working in distributed systems
- You don’t require time-ordering
✅ Use UUIDv7 When:
- You want timestamp-sorted UUIDs (e.g., for logs or event streams)
❌ Avoid:
- Truncating UUIDs for readability
- Comparing strings with differing formats
- Using UUIDs from insecure sources
✅ TL;DR (Again)
We generated 100 million UUIDv4s. We found no collisions.
Any claim otherwise is likely due to:
- Misuse (shortening, poor RNG)
- Misinterpretation (formatting bugs)
- Faulty testing (non-threadsafe containers, bad equality checks)
Final Thoughts
The "UUID collision" fear is a modern urban legend — technically true in a quantum-theoretical way, but practically nonexistent for any real software system.
UUIDv4 remains one of the most robust, distributed-safe, collision-resistant ID formats available.
So go ahead. Generate UUIDs with confidence. Your bits are safe.
🧪 Myth: DEBUNKED.