6 min read

The UUID Glossary

Table of Contents

Turns out, I couldn’t find a somewhat comprehensive cheatsheet on modern UUIDs, so I went ahead and compiled a quick list for my own reference.

Overview

A Universally Unique IDentifier (UUID) is composed of 32 hexadecimal digits (128 bits) in the format 8-4-4-4-12: 00000000-0000-0000-0000-000000000000.

Versions

UUIDv1 (MAC Address)

Based on a computer identifier (typically your MAC address) and the timestamp to generate unique identifiers.

Generally, UUIDv1 is not recommended due to its predictability and risk of collisions.

You may be looking for UUIDv4 or UUIDv7 instead.

    💡

    Notice if you keep the MAC address consistent, a large part of the UUID will remain constant.

    Reference: UUIDv1 Specification

    UUIDv2 (DCE Security)

    Similar to UUIDv1 but includes additional fields for POSIX UID/GID and a domain identifier. It is primarily used in DCE (Distributed Computing Environment) security and is generally left out in most UUID implementations due to it not being specified in RFC4122.

    Due to the additional fields, the timestamp is truncated meaning the clock section of the UUID only advances every 429.47 seconds (~7 minutes). Within a 7 minute period, there are only 64 available different UUIDs, making it largely obsolete.

    UUIDv3 (MD5 Hash)

    UUIDv3 generates UUIDs based on a namespace and a name using the MD5 hashing algorithm. This ensures that the same namespace and name will always produce the same UUID, making it deterministic.

    Note that MD5 is considered cryptographically broken and thus that should you to use this for “secure contexts” . For a slightly more secure (but still cryptographically broken) option, consider using UUIDv5.

      Reference: UUIDv3 Specification

      UUIDv4 (Random)

      UUIDv4 generates identifiers from truly-random or pseudo-random numbers, making it the most general-purpose and therefore widely used version of UUIDs for most applications.

      It has around 2^22 possible identifiers, making the likelihood of collisions low enough for most developers to not worry about it.

      Usage

      Ideal for general-purpose unique identifiers where ordering is not a concern.

      • Database Keys: Suitable for primary keys in databases where the order of insertion is not important (see UUIDv7).
      • Session Tokens: Can be used to internally map random session tokens for web applications.

        Reference: UUIDv4 Specification

        UUIDv5 (SHA-1 Hash)

        UUIDv5 generates UUIDs based on a namespace and a name using the SHA-1 hashing algorithm. This ensures that the same namespace and name will always produce the same UUID, making it deterministic.

        UUIDv5 is very similar to UUIDv3, and while SHA-1 is more secure than MD5, it is still considered cryptographically broken and should not be used for “secure contexts”.

          Reference: UUIDv5 Specification

          UUIDv6 (Ordered)

          Designed as a drop-in replacement for UUIDv1, UUIDv6 retains the timestamp and node (MAC address) components of UUIDv1 but reorders the fields to ensure that the UUIDs are K-Sortable ordered by time.

          If you have no reason to keep backwards compatibility with UUIDv1, then it is recommended to use UUIDv7 instead.

            Reference: UUIDv6 Specification

            UUIDv7 (Temporal)

            UUIDv7 combines a high-precision timestamp in milliseconds with random bits to generate K-sortable, chronologically ordered IDs.

            Many implementations of UUIDv7 also implement a “counter” component to handle cases where multiple UUIDs are generated within the same millisecond, reducing the risk of collision in high concurrency environments.

            Usage

            Ideal for time-sensitive systems where records need to be stored in the order they were created.

            • Databases: Primary keys for tables.
            • Logs: Unique identifiers for log entries.

              Reference: UUIDv7 Specification

              UUIDv8 (Custom)

              UUIDv8 allows developers complete flexibility to customize the UUID bits to suit their application and requirements. This is a powerful standard but can be very difficult to design, implement, and maintain. It is recommended to use an existing standard unless you have a specific requirement that cannot be met by other UUID variants.

              Usage

              Ideal for applications that need to embed metadata in the UUID.

              • Sharding and Routing: Skip additional querying by embedding shard and region information in the first n bits of the UUID, allowing for direct routing to the correct node.

              Reference: UUIDv8 Specification

              Others

              There are many other UUID variants such as ULID, KSUID, and more. Each has its own unique use cases and advantages, but it isn’t necessary to use them unless you have a specific requirement.

              Personally, I prefer using TypeID prefixed UUIDs (based on Stripe IDs), as they are a type-safe extension of UUIDv7 but are more human-readable.

               user_2x4y6z8a0b1c2d3e4f5g6h7j8k
               └──┘ └────────────────────────┘
               type    uuid suffix (base32)

              However, I also don’t mind using UUIDv7 when using databases like PostgreSQL which support the data type natively, as we get built-in validation and compression for it.

              Concepts

              K-Sortable

              💡

              K-Sortable UUIDs are a type of identifier that are chronologically ordered, optimized for insertion order where records need to be stored in the order they were created. This chronological ordering ensures that the UUIDs can be sorted in a meaningful way, typically by time.

              Data locality reduces fragmentation of data and has real-life performance gains. In the context of databases, K-Sortable UUIDs can significantly improve query performance for sequential indexes by clustering related data together. This reduces the need for extensive reorganization and allows for faster data retrieval.

              Namespaces

              [!note] Namespace UUIDs

              This allows UUIDs to be deterministically generated with a given input, very similar to a hash function. The namespace can be any UUID you want, while the name is appended to it, hashed and then stuffed back into a UUID.

              RFC 4122 suggested some useful namespaces, such as:

              • DNS = 6ba7b810-9dad-11d1-80b4-00c04fd430c8
              • URL = 6ba7b811-9dad-11d1-80b4-00c04fd430c8
              • ISO OID = 6ba7b812-9dad-11d1-80b4-00c04fd430c8
              • X.500 DN = 6ba7b814-9dad-11d1-80b4-00c04fd430c8

              I still am not convinced it’s worth all the extra abstraction to get a possibly more readable hash, but someone out there must have a use case for it.

              Resources

              https://adileo.github.io/awesome-identifiers/