Collation in Database - What you need to know

What is Collation? Collation refers to a set of rules that determine how data is sorted and compared. Collation settings specify the correct character sequence, case sensitivity, accent marks, character width and the characters used in comparison operations, such as equality and inequality. How does this affect the data? 1. Character data sort order This specifies how string data is sequenced. For example, whether 'a' comes before 'b'. 2. Case sensitivity This determines whether 'A' and 'a' are considered the same. 3. Accent sensitivity This determines whether 'a' and 'á' are considered the same. 4. Kana Sensitivity This determines whether two characters that look the same but are represented differently in Kana (a Japanese script) are treated as equal. 5. Width sensitivity This determines whether a single-byte character (half-width) and the same character represented as a double-byte character (full-width) are treated as equal. Where do you specify the collation? Collation can be specified at different levels depending on the specific database system. For example, in SQL Server, it can be specified at: Server level Database level Column level Expression (SQL Query) level The above levels are mentioned in the descending order of hierarchy i.e. when you create a database, you specify the default collation for the database. Every column in the database that uses character data types has a collation. If not specified, the column uses the default collation of the database. How collation affects the query results? For example, the default collation in SQL Server is SQL_Latin1_General_CP1_CI_AS The elements of this collation name represent the following. SQL - This prefix means that the collation is a SQL Server collation. Latin1_General - This is the base collation, which defines the rules that are used for sorting and comparing characters within the Latin1 character set. CP1 - This stands for Code Page 1252, which is a character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows in English and some other Western languages. CI - This stands for Case Insensitive. This means that the SQL Server does not consider case during comparison. For example, 'A' and 'a' are considered to be the same. AS - This stands for Accent Sensitive. This means that accented characters and unaccented characters are considered to be different. For example, 'a' and 'á' are considered to be different. Overall, SQL_Latin1_General_CP1_CI_AS is a SQL Server collation that is case insensitive and accent sensitive, using the Latin1 General rules and the 1252 Code Page. This is a commonly used collation in the English language for SQL Server. How to check your collation settings in SQL Server You can determine the server's character set in SQL Server by using the following SQL query. SELECT SERVERPROPERTY('Collation') If you need more information about the collation, you can use fn_helpcollations function which returns all the collations supported by your server. SELECT * FROM fn_helpcollations() Refer Collation in SQL Server to know more on various collation rules and levels.

Mar 27, 2025 - 12:39
 0
Collation in Database - What you need to know

What is Collation?

Collation refers to a set of rules that determine how data is sorted and compared. Collation settings specify the correct character sequence, case sensitivity, accent marks, character width and the characters used in comparison operations, such as equality and inequality.

How does this affect the data?

1. Character data sort order

This specifies how string data is sequenced. For example, whether 'a' comes before 'b'.

2. Case sensitivity

This determines whether 'A' and 'a' are considered the same.

3. Accent sensitivity

This determines whether 'a' and 'á' are considered the same.

4. Kana Sensitivity

This determines whether two characters that look the same but are represented differently in Kana (a Japanese script) are treated as equal.

5. Width sensitivity

This determines whether a single-byte character (half-width) and the same character represented as a double-byte character (full-width) are treated as equal.

Where do you specify the collation?

Collation can be specified at different levels depending on the specific database system. For example, in SQL Server, it can be specified at:

  1. Server level
  2. Database level
  3. Column level
  4. Expression (SQL Query) level

The above levels are mentioned in the descending order of hierarchy i.e. when you create a database, you specify the default collation for the database. Every column in the database that uses character data types has a collation. If not specified, the column uses the default collation of the database.

How collation affects the query results?

For example, the default collation in SQL Server is

SQL_Latin1_General_CP1_CI_AS

The elements of this collation name represent the following.

SQL - This prefix means that the collation is a SQL Server collation.

Latin1_General - This is the base collation, which defines the rules that are used for sorting and comparing characters within the Latin1 character set.

CP1 - This stands for Code Page 1252, which is a character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows in English and some other Western languages.

CI - This stands for Case Insensitive. This means that the SQL Server does not consider case during comparison. For example, 'A' and 'a' are considered to be the same.

AS - This stands for Accent Sensitive. This means that accented characters and unaccented characters are considered to be different. For example, 'a' and 'á' are considered to be different.

Overall, SQL_Latin1_General_CP1_CI_AS is a SQL Server collation that is case insensitive and accent sensitive, using the Latin1 General rules and the 1252 Code Page. This is a commonly used collation in the English language for SQL Server.

How to check your collation settings in SQL Server

You can determine the server's character set in SQL Server by using the following SQL query.

SELECT SERVERPROPERTY('Collation')

If you need more information about the collation, you can use fn_helpcollations function which returns all the collations supported by your server.

SELECT * FROM fn_helpcollations()

Refer Collation in SQL Server to know more on various collation rules and levels.