Wednesday, July 22, 2009

Uniqueidentifier Column as Primary Key, a worst choice

 GUID or int as Primary key ???


Though it is not necessary that your Primary Key column is always a cluster index too. By default, Sql Server creates cluster index on column or group of columns which you have declared as your table PK, and most of DBAs don't like to go against this default behavior of Sql Server.

But problem arises when for uniqueness a column with uniqueidentifier data type is added, for surrogate key to make it Primary Key, finally for your table.

The GUID is a wide column (16 bytes to be specific) and contains a unique combination of 33 uppercase and numeric characters. This column because it is the primary key is going to be stored in, of course, the clustered index .

Also, if a GUID is used instead of an integer identity column then the 33 characters need to be matched for each row that is returned using that column in the where clause.


If a high volume of inserts are done on these tables then GUID's being large will contribute to page splits, as will the fact that NEWID() generates a random value, which could place a new record on any of the data pages will cause performance problems.


Recommendations

  1. INT must be used as Primary Key instead of GUID because:
  2. INT takes only 4 bytes, saving your physical and memoray storage.
  3. INT as primary key (identity) creates incremental values resulting less then 1% of indexes fregmention during heavy insert.
  4. There are T-SQL operators available for INT like >,= and <

8 comments:

  1. 2. You can use pretty good UUID with only 8 bytes (using Base64). It's more than 4, but ok to me.

    3 and 4. If you use a UUID based on time and offset, they are incremental so their is no fragmentation and you can use >= and <.

    ReplyDelete
  2. The controversy continues lol.
    Try high volume replication and tell me if you think using an Integer as a primary key is a good idea.

    ReplyDelete
  3. guid aren`t match character by character,they can also be match in 4 asm integer comparsion instructions or even in 2 if using 64-bit machines.
    the main problem with guid is it randomess and big size

    ReplyDelete
  4. Why would using <=> on a primary key ever be a good idea? And yeah--replication using a sequence is teh badz.

    ReplyDelete
  5. MySql provides UUID() function, But Microsoft Sql Server only supports GUID, which never generates sequential values.
    For replication: You need GUID column for Merge Replication. Yes you can have separate column (other then PK column)for this purpose, But most important is to avoid cluster index fragmentation.

    ReplyDelete
  6. Sql Server doesn't have UUID() but does have NEWSEQUENTIALID() (SQL 2008??).

    ReplyDelete
  7. SELECT NEWID() --- to get new guid in sql server

    ReplyDelete
  8. So... use a guid and make the PK nonclustered. Simples!

    ReplyDelete

All suggestions are welcome