Primary Keys and Foreign Keys Differences
- They serve very different purposes and each is governed by its own set of rules.
- That said, they can often be related to each other in very important ways, but they still remain separate entities.
- A primary key provides a mechanism for ensuring that the rows in a table are unique. The values contained in the column or columns that make up the primary key serve to identify each of those rows. For example, a primary key is often made up of a single IDENTITY column. Regardless of how many values are repeated in other columns, the primary key values are always unique ensure that each row can be identified separately from all other rows. Other columns might also contain unique values, but a table can contain only one primary key. Because the primary key must be able to identify each row, no columns that participate in a primary key can contain NULL values. In addition, if you create a composite primary key (more than one column), the individual columns can contain duplicate values, but the columns collectively cannot and together must provide the unique identifiers for each row.
- When you create a primary key, SQL Server automatically creates an index based on the key columns. If no clustered index is defined on the table, SQL Server creates a clustered index; otherwise, a nonclustered index is created.
- Like a primary key, a foreign key is also a type of constraint placed on one or more columns in a table. The foreign key establishes a link between the key columns and related columns in another table. (You can also link the foreign key columns to columns within the same table.) The table that contains the foreign key is considered the child table, and the table that the foreign key references is the parent table. The foreign key restricts what data can be stored in the foreign key columns in the child table, based on the data in the referenced columns in the parent table.
- The foreign key enforces referential integrity between the two tables. That means you can add only permitted data to the foreign key columns in the child table. At the same time, you cannot modify the parent table in such a way that the modification would negatively impact the foreign key values in the child table. By default, SQL Server prevents these types of actions. However, SQL Server also provides several methods for working with foreign keys and referential integrity when modifying a table referenced by a foreign key.
- Other ways that foreign keys differ from primary keys are that you can create more than one foreign key on a table and you can define foreign keys on columns that permit NULL values. In addition, SQL Server does not automatically index the foreign key columns like it does for primary keys. If you want to index the foreign key columns, you must do so as a separate step.
Primary Selection
- The most common advice you’ll likely find on the subject goes something like this: “Use the column or columns that make the most sense in a given situation.” That’s a lot of help. Many database developers default to using a simple surrogate key—a single column, usually an INT, that identifies each row in the table but is not related to the rest of the table’s data in any meaningful way. You’ll often see an IDENTITY column used for this purpose, in which a unique integer is automatically assigned to each new row added to the table.
- There’s good reason for going this route. SQL Server’s query processor often uses the index associated with the primary key to perform lookups and comparisons. Part of the reason for this is that queries frequently join tables based on primary key columns, usually in conjunction with foreign keys. More often than not, then simpler the primary key, the better your queries will perform. An integer takes only four bytes, is fixed length, includes no special characters or spaces, and is easily sorted and searched.
- GUIDs are better, according to some developers, and they also have some good reason for feeling this way. For example, GUIDs can be generated at the application layer, rather than the data layer, and can provide uniqueness across the entire database. On the other hand, GUIDs are much larger than integers and do not perform as well. They can also be annoying to work with. Yet business needs might override performance considerations, especially if you’re dealing with small quantities of data, in which GUIDs will work fine.
GUID Pros
- Unique across every table, every database, every server
- Allows easy merging of records from different databases
- Allows easy distribution of databases across multiple servers
- You can generate IDs anywhere, instead of having to roundtrip to the database
- Most replication scenarios require GUID columns anyway
GUID Cons
- It is a whopping 4 times larger than the traditional 4-byte index value; this can have serious performance and storage implications if you're not careful.