🎯 The Premise

_"I generated 100 million UUIDs and got 3 duplicates."_

This kind of claim pops up in dev forums every few months — sometimes as a warning, sometimes as a humblebrag, always as a conversation starter.

So, we decided to actually test it.

🧪 The Experiment Setup

We generated 100 million UUIDv4 identifiers using Python, Go, and Node.js — three popular environments that rely on cryptographically secure random number generators.

Generation Code (Python)

python

import uuid

seen = set()
collisions = 0

for _ in range(100_000_000):
    u = uuid.uuid4()
    if u in seen:
        collisions += 1
    seen.add(u)

print(f"Collisions found: {collisions}")

> Note: The final run was batched and parallelized using multiprocessing and memory-mapped sets for performance.

🧾 The Result: Zero Collisions

No duplicates were found. Zilch. Nada.

The myth? Busted. At least in every properly executed run.

🧠 Wait... Where Did the “3 Collisions” Come From?

We dug into the most common causes behind these misleading claims:

1. 🧹 Truncated UUIDs

Some systems shorten UUIDs (e.g. keep only the first 8 characters for brevity).

Example:

python

short = str(uuid.uuid4())[:8]

This drastically reduces the keyspace from 2¹²² to 2³² — making collisions probable, not rare.

2. 🧮 String vs Binary Comparisons

Comparing UUIDs as strings vs bytes can introduce subtle bugs.

python

str(uuid.uuid4()) != uuid.uuid4().bytes

Normalize format before checking for equality.

3. 🐞 Faulty Randomness

If you're using non-cryptographic RNGs (Math.random(), or insecure PRNGs), UUIDv4 loses its guarantees.

Always use a CSPRNG (os.urandom, crypto.randomUUID, etc.)

4. 🧍 Parallelism Bugs

Multiple threads or processes writing to a shared file or memory structure without synchronization can create phantom duplicates.

Use locks or atomic operations for integrity in high-throughput generation.

🎲 The Real Odds of UUIDv4 Collisions

UUIDv4 has 122 bits of randomness. The total number of possible UUIDs is:

code

2^122 ≈ 5.3 x 10^36

Using the birthday paradox, the expected number of UUIDs needed to hit a 50% chance of collision is:

code

√(π/2 * 2^122) ≈ 2.71 x 10^18 UUIDs

That's 2.7 quintillion UUIDs — far beyond any real-world application.

For comparison:

1 billion UUIDs/day = ~7,400 years to reach 50% collision odds
Generating 100 million UUIDs is barely a ripple in the pool

🧬 UUID Collisions in Real Systems

Real-world systems using UUIDv4:

PostgreSQL primary keys
DynamoDB partition keys
Kubernetes object UIDs
Git object hashes (not UUIDs, but same uniqueness principle)

To date, no production-grade UUIDv4 collision has ever been publicly verified — unless caused by system misconfiguration or misuse.

🔐 Takeaways for Developers

✅ Use UUIDv4 When:

You need fast, globally unique IDs
You're working in distributed systems
You don’t require time-ordering

✅ Use UUIDv7 When:

You want timestamp-sorted UUIDs (e.g., for logs or event streams)

❌ Avoid:

Truncating UUIDs for readability
Comparing strings with differing formats
Using UUIDs from insecure sources

✅ TL;DR (Again)

We generated 100 million UUIDv4s. We found no collisions.

Any claim otherwise is likely due to:

Misuse (shortening, poor RNG)
Misinterpretation (formatting bugs)
Faulty testing (non-threadsafe containers, bad equality checks)

Final Thoughts

The "UUID collision" fear is a modern urban legend — technically true in a quantum-theoretical way, but practically nonexistent for any real software system.

UUIDv4 remains one of the most robust, distributed-safe, collision-resistant ID formats available.

So go ahead. Generate UUIDs with confidence. Your bits are safe.

🧪 Myth: DEBUNKED.

I Created 100 Million UUIDs and Found 3 Collisions: Debunking a Common Myth

🎯 The Premise

🧪 The Experiment Setup

Generation Code (Python)

🧾 The Result: Zero Collisions

🧠 Wait... Where Did the “3 Collisions” Come From?

1. 🧹 Truncated UUIDs

2. 🧮 String vs Binary Comparisons

3. 🐞 Faulty Randomness

4. 🧍 Parallelism Bugs

🎲 The Real Odds of UUIDv4 Collisions

🧬 UUID Collisions in Real Systems

🔐 Takeaways for Developers

✅ Use UUIDv4 When:

✅ Use UUIDv7 When:

❌ Avoid:

✅ TL;DR (Again)

Final Thoughts

Read Next

UUIDs vs. Auto-increment IDs: The Holy War of Database Primary Keys

Binary UUIDs: Saving Space and Improving Performance

Testing UUID-Heavy Systems: Mocking, Seeding, and Verification

Generate Your Own UUIDs

Generate a Single UUID

Bulk UUID Generator

Summary

TLDR;

Cookie Consent