- A collation is a configuration setting that determines how the database engine should treat character data at the server, database, or column level.
- SQL Server includes a large set of collations for handling the language and regional differences that come with supporting users and applications in different parts of the world.
COLLATE
is a clause that can be applied to a database definition or a column definition to define the collation, or to a character string expression to apply a collation cast.
- The COLLATE clause can be specified at several levels:
- Creating or altering a database.
- Creating or altering a table column (If you do not specify a collation, the column is assigned the default collation of the database.)
- Casting the collation of an expression.
- You can use the COLLATE clause to apply a character expression to a certain collation.
- Character literals and variables are assigned the default collation of the current database.
- Column references are assigned the definition collation of the column.
- The collation of an identifier depends on the level at which it is defined.
- Identifiers of instance-level objects, such as logins and database names, are assigned the default collation of the instance.
- Identifiers of objects within a database, such as tables, views, and column names, are assigned the default collation of the database.
- For example, two tables with names different only in case may be created in a database with case-sensitive collation, but may not be created in a database with case-insensitive collation.
- Variables, GOTO labels, temporary stored procedures, and temporary tables can be created when the connection context is associated with one database, and then referenced when the context has been switched to another database.
- The identifiers for variables, GOTO labels, temporary stored procedures, and temporary tables are in the default collation of the server instance.
- The COLLATE clause can be applied only for the char, varchar, text, nchar, nvarchar, and ntext data types.
- COLLATE uses collate_name to refer to the name of either the SQL Server collation or the Windows collation to be applied to the expression, column definition, or database definition.
collation_name
can be only a specified Windows_collation_name or a SQL_collation_name and the parameter must contain a literal value.
collation_name
cannot be represented by a variable or expression.
- The COLLATE clause is potentially going to cause a conversion of the column.
- The effect will depend on two things:
- Whether the target collation is the same as the column collation or not. If the target is the same then there is no scalar conversion required.
- Whether this COLLATE clause is present in the predicate (such as WHERE clause or JOIN predicate). If it is in the predicate then it can cause a scan (if condition a) above has also been satisfied.)
- If the COLLATE is present in the output list (not in the predicate) and if condition a) has been satisfied then it will cause a scalar conversion operator to be introduced into the plan.
- Depending on how many such conversions (rows * columns) it may cause additional CPU overhead.
- Collation mismatches can be dangerous – worst case, they can cause data loss or errors to occur (if coercion is disallowed) or in many cases can (at the very least) cause performance problems.
- The usage of the COLLATE clause can introduce a potential performance penalty.
- Lastly, if you are using cross-database references in a batch, beware of the collation assigned to each local variable – they inherit the same database collation of the last USE database prior to the first DECLARE.