WordPress Database Charset and Collation Configuration

Pootzko over at this years new Q&A site for WordPress Administrators and Integrators was wondering why creating database tables via wpdb->query() / SQL didn’t reflect his worpdress collation settings. As it was not obvious for him, I thought about writing this up to learn more. My post describes how wordpress deals with charset and collation settings while making use of the MySQL database and gives some useful suggestions before installing WordPress. The information is based on WordPress 3.0 / 3.1.

Contents


Database Charset and Collation prior Installation

WordPress does not create the database on installation. This is important to know because all tables within a database will inherit their charset and collation from the database setting. So you should ensure that the database’s charset and collation is properly set before wordpress is installed.

MySQL’s default database charset is normally latin1 and collation is latin1_swedish_ci but this can vary based on server configuration. Most probably you would like to have UTF-8 here instead.

So prior installing, it’s a good idea to set the charset of the (empty) database to utf8 and collation to utf8_general_ci. The according commands are either CREATE DATABASE or ALTER DATABASE.

An exemplary query for an existing database named wpdb could look like this:

ALTER DATABASE wpdb CHARACTER SET utf8 COLLATE utf8_general_ci;

hakre’s tip: For interoperability reasons, choose the utf8 charset and the case-insensitive utf8_general_ci collation. Set those as default for the database before you install wordpress.

Database Charset and Collation in Tables

On installation, wordpress creates some database tables. To understand with which charset and collation these tables are created and used later on, you need to check your configuration and compare multiple settings with each other. The following play some role:

  • MySQL Server and Database Settings
  • MySQL Server and PHP Version
  • WordPress Database Charset and Collation Settings

I’ll go though each of them now.

MySQL Server and Database Settings

As already described, the most global options are the MySQL server and database charset and collation configuration. All tables will inherit from those in the end. If wordpress finds own settings for charset and collation, it will use them for the default tables it creates during install – but not for the tables you might create in a plugin (see wp-admin/includes/schema.php and $charset_collate if you like to learn more).

So check your MySQL server configuration first. Tools like PHPMyAdmin display charset and collation information. You can do with SQL statements as well:

USE your_database_of_interest;
SHOW VARIABLES LIKE "character_set_database";
SHOW VARIABLES LIKE "collation_database";

MySQL Server and PHP Version

WordPress groups database related functions in a class called wpdb. This class does best guesses while connecting to the database server to set charset and collation for that connection. Because there is not only a charset setting for the data in the database but as well for the connection. This is some kind of automatic configuration. While doing this, wordpress checks against the installed MySQL and PHP version if a feature exists. The interesting versions are:

  • MySQL >= 4.1 – collation support
  • MySQL >= 5.0.7 + PHP >= 5.2.3 – connection character set support

These configuration versions are mostly important for connecting to the database.

From MySQL version 4.1 on to 5.0.6, wordpress will use SET NAMES $charset COLLATE $collate; to set charset and collation for the connection.

Starting with MySQL version 5.0.7 and PHP 5.2.3, the character set of the connection will be set by making use of mysql_set_charset. This is the preferred way to set the connection charset. Using mysql_query() to execute SET NAMES ... as above is not recommended.

hakre’s tip: I highly recommend to use MySQL server >= 5.0.7 and PHP >= 5.2.3. This will not only deal with character sets in connections properly but as well enable real escape instead of having wordpress using addslashes for database queries.

WordPress Database Charset and Collation Settings

What’s left is to configure the charset and collation settings for wordpress itself. WordPress in its default configuration behaves differently based on the type of installation.

  • Standard – Default charset is utf8 and collation is none and therefore defined by standard system settings (MySQL Server).
  • Multisite – Default charset is utf8 and collation is utf8_general_ci.

You can override any of these by using the DB_CHARSET and DB_COLLATE constants.

You can specify both settings manually in wp-config.php:

define('DB_CHARSET', 'utf8');
define('DB_COLLATE', 'utf8_general_ci');

This will override any guessing and resetting that is going on behind the scenes in wpdb for both types, standard and multisite.

hakre’s tip: Before installing edit wp-config-example.php and pre-configure your installation by setting DB_CHARSET and DB_COLLATE.

Summary

Charsets can be complicated to deal with if you forget to do a setting. Over the years wordpress tried to make most decisions for you which can not always be fitting. Knowing where, when and how to make charset and collation settings can save you some hassles in the long run.

WordPress MySQL Charset and Collation Checklist

Before installing WordPress, you can check and do the following:

  • Ensure you have got PHP >= 5.2.3 and MySQL >= 5.0.7
    $ mysql --version
    $ php --version
  • Set the default charset and collation for the database
    ALTER DATABASE wpdb CHARACTER SET utf8 COLLATE utf8_general_ci;
  • Set DB_CHARSET and DB_COLLATE in wp-config-example.php
    define('DB_CHARSET', 'utf8');
    define('DB_COLLATE', 'utf8_general_ci');

See also: Technical WordPress Installation Checklist

This entry was posted in Hacking The Core, Hakre's Tips, Pressed, Reports and tagged , , , , , , , . Bookmark the permalink.

5 Responses to WordPress Database Charset and Collation Configuration

  1. Pingback: WordPress Plugin Table Character Sets and Collation | Kevin's Space Blog | Kevin's Space Blog

  2. Pingback: How To: Streamline Wordpress by removing Database calls - Hamell.net

  3. Pingback: How To Run WordPress On Your Local Windows Computer? – Maria Daniel Deepak

  4. Bahi says:

    Hello. Thanks for this – found it useful, more than six years after you published it. Do you know of any issues using a case-sensitive collation? I’m writing a query that matches contents of a custom field that stores comma separated (non-serialised) values, e.g. ‘PR, Programming, IT, Iterations’. Using wp_query with a meta_query, I can get PR to match ‘PR’ using a ‘LIKE’ comparator but it also matches ‘Programming’. Changing the collation to case-insensitive would fix the problem but would it introduce others that you know of?

    • hakre says:

      Well, this is at leas two-folded, Even you don’t use (PHP) serialized values, you have a serialization of a list (in comma and space separated form. This is not easy to address with LIKE in Mysql, but take a look at the set functions Mysql has to offer. These parse comma separated values pretty well and might be what you’re looking for regardless of case-sensitivity.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.