GROUP BY
groups a selected set of rows into a set of summary rows by the values of one or more columns or expressions.
- One row is returned for each group.
- Rows that do not meet the conditions in the
WHERE
clause are removed before any grouping operation is performed.
- Aggregate functions provide information about each group instead of individual rows.
- The
GROUP BY
clause does not order the result set.
- If a grouping column contains null values, all null values are considered equal, and they are put into a single group.
- A
GROUP BY
clause can be described as general or simple:
- A general GROUP BY clause includes
GROUPING SETS
, CUBE
, ROLLUP
.
SELECT
DATEPART(yyyy,OrderDate) AS N'Year',
SUM(TotalDue) AS N'Total Order Amount'
FROM Sales.SalesOrderHeader
GROUP BY DATEPART(yyyy,OrderDate)
ORDER BY DATEPART(yyyy,OrderDate);
GROUP BY
vs. DISTINCT
Performance
- While
DISTINCT
better explains the intent, and GROUP BY
is only required when aggregations are present, they are interchangeable in many cases.
- In complex queries,
DISTINCT
can end up doing more work.
- Essentially,
DISTINCT
collects all of the rows, including any expressions that need to be evaluated, and then tosses out duplicates.
GROUP BY
can filter out the duplicate rows before performing any of that work.
HAVING
Clause
- The
WHERE
clause filters out rows prior to them being aggregated.
- The
HAVING
clause allows filters out the aggregated rows based on a criteria.
SELECT DATEPART(yyyy,OrderDate) AS N'Year',
SUM(TotalDue) AS N'Total Order Amount'
FROM Sales.SalesOrderHeader
GROUP BY DATEPART(yyyy,OrderDate)
HAVING DATEPART(yyyy,OrderDate) >= 2003
ORDER BY DATEPART(yyyy,OrderDate);
- It generates the simple GROUP BY aggregate rows, plus subtotal or super-aggregate rows, and also a grand total row.
- The number of groupings that is returned equals the number of expressions in the <composite element list> plus one.
SELECT
[Region] = [T].[Group],
[Territory] = [T].[Name],
[Year] = DATEPART(yyyy, [S].[OrderDate]),
[Month] = DATEPART(mm, [S].[OrderDate]),
[TotalRevenue] = SUM([S].[TotalDue])
FROM Sales.SalesOrderHeader AS [S]
INNER JOIN Sales.SalesTerritory AS [T]
ON [S].TerritoryID = [T].TerritoryID
GROUP BY ROLLUP
(
[T].[Group],
[T].[Name],
DATEPART(yyyy, [S].[OrderDate]),
DATEPART(mm, [S].[OrderDate])
)
- The
CUBE
does the same function but, instead of providing a hierarchy of totals in ordered super-aggregate rows, it provides all the ‘super-aggregate’ permutations (‘symmetric super-aggregate’ rows), also known as cross-tabulation rows.