GROUP BY groups a selected set of rows into a set of summary rows by the values of one or more columns or expressions.
- One row is returned for each group.
- Rows that do not meet the conditions in the
WHERE clause are removed before any grouping operation is performed.
- Aggregate functions provide information about each group instead of individual rows.
- The
GROUP BY clause does not order the result set.
- If a grouping column contains null values, all null values are considered equal, and they are put into a single group.
- A
GROUP BY clause can be described as general or simple:
- A general GROUP BY clause includes
GROUPING SETS, CUBE, ROLLUP.
SELECT
DATEPART(yyyy,OrderDate) AS N'Year',
SUM(TotalDue) AS N'Total Order Amount'
FROM Sales.SalesOrderHeader
GROUP BY DATEPART(yyyy,OrderDate)
ORDER BY DATEPART(yyyy,OrderDate);
GROUP BY vs. DISTINCT Performance
- While
DISTINCT better explains the intent, and GROUP BY is only required when aggregations are present, they are interchangeable in many cases.
- In complex queries,
DISTINCT can end up doing more work.
- Essentially,
DISTINCT collects all of the rows, including any expressions that need to be evaluated, and then tosses out duplicates.
GROUP BY can filter out the duplicate rows before performing any of that work.
HAVING Clause
- The
WHERE clause filters out rows prior to them being aggregated.
- The
HAVING clause allows filters out the aggregated rows based on a criteria.
SELECT DATEPART(yyyy,OrderDate) AS N'Year',
SUM(TotalDue) AS N'Total Order Amount'
FROM Sales.SalesOrderHeader
GROUP BY DATEPART(yyyy,OrderDate)
HAVING DATEPART(yyyy,OrderDate) >= 2003
ORDER BY DATEPART(yyyy,OrderDate);
- It generates the simple GROUP BY aggregate rows, plus subtotal or super-aggregate rows, and also a grand total row.
- The number of groupings that is returned equals the number of expressions in the <composite element list> plus one.
SELECT
[Region] = [T].[Group],
[Territory] = [T].[Name],
[Year] = DATEPART(yyyy, [S].[OrderDate]),
[Month] = DATEPART(mm, [S].[OrderDate]),
[TotalRevenue] = SUM([S].[TotalDue])
FROM Sales.SalesOrderHeader AS [S]
INNER JOIN Sales.SalesTerritory AS [T]
ON [S].TerritoryID = [T].TerritoryID
GROUP BY ROLLUP
(
[T].[Group],
[T].[Name],
DATEPART(yyyy, [S].[OrderDate]),
DATEPART(mm, [S].[OrderDate])
)
- The
CUBE does the same function but, instead of providing a hierarchy of totals in ordered super-aggregate rows, it provides all the ‘super-aggregate’ permutations (‘symmetric super-aggregate’ rows), also known as cross-tabulation rows.