- The query optimizer uses statistics to create query plans that improve query performance.
- For most queries, the query optimizer already generates the necessary statistics for a high quality query plan; in a few cases, you need to create additional statistics or modify the query design for best results.
- Statistics for query optimization are objects that contain statistical information about the distribution of values in one or more columns of a table or indexed view.
- The query optimizer uses these statistics to estimate the cardinality, or the number of rows, in the query result.
- These cardinality estimates enable the query optimizer to create a high-quality query plan.
- For example, the query optimizer could use cardinality estimates to choose the index seek operator instead of the more resource-intensive index scan operator, and in doing so improve query performance.
- Each statistics object is created on a list of one or more table columns and includes a histogram displaying the distribution of values in the first column.
- Statistics objects on multiple columns also store statistical information about the correlation of values among the columns.
- These correlation statistics, or densities, are derived from the number of distinct rows of column values.
- You can find the list of statistics on a table using
sys.stats
.
SELECT *
FROM sys.stats
WHERE OBJECT_ID = OBJECT_ID(N'Production.Product', N'U');
DBCC SHOW_STATISTICS (N'Production.Product', N'PK_Product_ProductID');
- After finding the statistics name, you can use the
DBCC SHOW_STATISTICS
.
- This command is going to return three result sets.
- The first result set is the header information with some basic information about the statistic such as name when it was updated, the number of rows sampled, etc.
- The second data set is the vector information this gives the density, which is 1 / distinct values, along with the average length in bytes, and finally the column name.
- The third data set, the histogram, is the most interesting and useful part of the statistic and returns the following columns:
- RANGE_HI_KEY: this is the upper limit of the range of values that fall within this step.
- RANGE_ROWS: this is the estimated number of rows that fall within this step but excludes the upper limit.
- EQ_ROWS: this is the estimated number of rows that are equal to the RANGE_HI_KEY value.
- DISTINCT_RANGE_ROWS: this is the estimated number of distinct values excluding the upper bounds.
- AVG_RANGE_ROWS: this is calculated by RANGE_ROWS / DISTINCT_RANGE_ROWS for DISTINCT_RANGE_ROWS > 0
Get All Tables Statistics Health Report
SELECT DISTINCT
OBJECT_NAME(s.[object_id]) AS TableName,
c.name AS ColumnName,
s.name AS StatName,
STATS_DATE(s.[object_id], s.stats_id) AS LastUpdated,
DATEDIFF(d,STATS_DATE(s.[object_id], s.stats_id),getdate()) DaysOld,
dsp.modification_counter,
s.auto_created,
s.user_created,
s.no_recompute,
s.[object_id],
s.stats_id,
sc.stats_column_id,
sc.column_id
FROM sys.stats s
JOIN sys.stats_columns sc ON sc.[object_id] = s.[object_id] AND sc.stats_id = s.stats_id
JOIN sys.columns c ON c.[object_id] = sc.[object_id] AND c.column_id = sc.column_id
JOIN sys.partitions par ON par.[object_id] = s.[object_id]
JOIN sys.objects obj ON par.[object_id] = obj.[object_id]
CROSS APPLY sys.dm_db_stats_properties(sc.[object_id], s.stats_id) AS dsp
WHERE OBJECTPROPERTY(s.OBJECT_ID,'IsUserTable') = 1
AND (s.auto_created = 1 OR s.user_created = 1)
ORDER BY DaysOld;
- You can update a table statistics using the following command:
UPDATE STATISTICS Production.Product PK_Product_ProductID WITH FULLSCAN;