What is the best Character-Set and Collation for WordPress Database
Like most of the content management system, WordPress depends on the database to manage the content for the blog/site. Character set and Collation defines how the data is stored in the database and how it is read. Character encoding is a way of storing the data. There is no default Character set or collation for WordPress database when I started using WordPress on 2007. But you could see the default character set is already set down in sample-config.php file when you download the WordPress files. The default Character set was ‘utf8‘ ever since the WordPress version 3.0 (to my knowledge).
So, it is the best Character Set? NO
According to Gary Pendergast’s post, from the version 4.2 WordPress has started to adapt to ‘utf8mb4’ character set. So if you have installed WordPress after the 4.2 release, you don’t have to worry about this. Because the tables created by WordPress will have the character set as ‘utf8mb4’. Unfortunately for sites which have been installed before the mentioned version have different character set which may not play good with WordPress codes.
For those who have ‘utf8mb4’ should have the collation as ‘utf8mb4_unicode_ci’ and also ‘utf8mb4_general_ci’. StackOverflow user Thomas Rutter has given a clear view on the difference between these two character set. According to him ‘utf8mb4_unicode_ci’ is slightly slower than ‘utf8mb4_general_ci’. But on another hand, it is the price we have to pay for accuracy. ‘unicode’ type collation performs better with a lot of support to many scripts (language scripts) including modern smileys.
When I started blogging with WordPress, the Php and MySql version has no support for utf8mb4. Since that day, my WordPress database has undergone many upgrades, sadly no utf8mb4. Till date, many of our sites use ‘utf8’ as the default character set. The default collation was ‘latin1_swedish_ci’ for all the tables and columns. Due to some technical issue, the table’s collation can’t be changed from ‘latin1..’ collation to ‘utf8mb4..’ collation. utf8mb4 character set requires more bytes to store a character than latin/utf8, almost double it seems. So I left it for later.
For now, we changed all the collation for table and columns of each table with ‘utf8_unicode_ci’.