- An index is usually and preferably a copy of a very small vertical section of a table, such as a single field, and preferably a short length field.
- The act of creating an index physically copies one or more fields to be indexed into a separate area of disk other than that of the table.
- In some databases, indexes can be stored in a file completely separated from the original table.
- The important factor is the underlying physical separation.
- When a table is accessed, a process (Optimizer) decides whether scan all the records in the table, or if it is faster to read the much smaller index in conjunction with a very small section of the table.
- Indexes are not really part of the relational database model itself; however, indexes are so important to performance and overall database usability that they simply have to be introduced.
- Index characteristics include the following:
- Clustered versus non-clustered.
- Unique versus non-unique.
- Single column versus multi-column.
- Ascending or descending order on the columns in the index.
- Full-table versus filtered for non-clustered indexes.
- A partition is a user-defined unit of data organization.
- A table is contained in one or more partitions.
- When a table or index uses multiple partitions, the data is partitioned horizontally so that groups of rows are mapped into individual partitions, based on a specified column.
- The partitions can be put on one or more filegroups in the database.
- SQL Server tables use one of two methods to organize their data pages within a partition:
- Clustered tables have a clustered index.
- The data rows are stored in order based on the clustered index key.
- The clustered index is implemented as a B-tree index structure.
- B-tree index supports fast retrieval of the rows, based on their clustered index key values.
- Heaps are tables that have no clustered index.
- The data rows are not stored in any particular order.
- There is no particular order to the sequence of the data pages.
- The pages of the heap or clustered index are managed in one or more allocation units, depending on the column data types.
- When a heap or a clustered table has multiple partitions, each partition has a heap or B-tree structure that contains the group of rows for that specific partition.
- For example, if a clustered table has four partitions, there are four B-trees; one in each partition.
- There can only be one clustered index per table.
- A clustered index can be rebuilt or reorganized on demand to control table fragmentation.
- A clustered index can also be created on a view.
- Clustered indexes are implemented in the following ways:
PRIMARY KEY
and UNIQUE
constraints:
- When you create a
PRIMARY KEY
constraint, a unique clustered index on the column(s) is automatically created if a clustered index on the table does not already exist.
- In that case, you do not specify the unique non-clustered index.
- When you create a
UNIQUE
constraint, a unique non-clustered index is created to enforce a UNIQUE constraint by default.
- You can specify a unique clustered index if a clustered index on the table does not already exist.
- Index independent of a constraint: You can create a clustered index on a column other than primary key column if a nonclustered primary key constraint was specified.
Clustered Index Structures
- In SQL Server, indexes are organized as B-trees.
- Each page in an index B-tree is called an index node.
- The top node of the B-tree is called the root node.
- The bottom level of nodes in the index is called the leaf nodes.
- Any index levels between the root and the leaf nodes are collectively known as intermediate levels.
- In a clustered index, the leaf nodes contain the data pages of the underlying table.
- The root and intermediate level nodes contain index pages holding index rows.
- Each index row contains a key value and a pointer to either an intermediate level page in the B-tree, or a data row in the leaf level of the index.
- The pages in each level of the index are linked in a doubly-linked list.