29 Jan 2013 12:19
Tags databases mysql
Recently came across a requirement where I had to change the character set of all the databases that were hosted in our MySQL instance(s). Its very common for people to start using the defaults of MySQL till the point they realize they are unable to insert any UTF8 chars and that they need to convert the existing tables from latin1 to UTF8. Because of my newness to databases I really didn't know what to do and how to do it, so started to search the web for it. After a few other useful posts I found this from Ryan Lowe on Percona Blog. That blog post explains different methods for converting the character set & why some might not be the preferred ones. It also has a script that Ryan wrote to convert all tables to a given character set for a given database using the "alter modify" (you can read about the benefits of this approach in the same blog post) approach, which I found pretty useful. So, I sourced and started to use it. I got hit by two problems:
- Foreign Keys restrict referred tables from being renamed, hence they are not converted.
- From the original script's todo list it seems Ryan had plans to handle this.
- As we use InnoDB, Primary and Unique key constraints can't have a column which is more than 255 chars long.
- It seems in MySQL 5.1 (not sure about the other versions) one can't have a key more than 767 bytes long when using InnoDB storage engine. So, 256 * 3 (no. of bytes MySQL uses to store UTF8 chars) = 768 won't work.
- One would see an error like "Specified key was too long; max key length is 767 bytes"
To solve the first problem I modified the script (attached) to do the following:
- Discover Foreign Key constraints at runtime.
- Remove discovered Foreign Key constraints before converting table.
- Add all discovered Foreign Key constraints back after converting all the tables.
- Converting all the tables before adding the FK constraints back is a must because otherwise referred and referring columns' data types might not match and FK addition may fail.
Note: The modified script removed forking children because the parent process was anyways 'wait'ing on children inside the loop, hence not parallelising things. It shouldn't be hard to add forking though.
And AFAIK there are quite a few solutions to the second problem:
- Simplest & probably the cleanest one is to reduce the column size if you can.
- Other one is to use partial content of the column for constraints in question. The validity and effectiveness of this solution depends on several factors including:
- Column content
- Use cases for the column/key in question
- It turns out MySQL version 5.5.14 has introduced a new configuration param called "innodb_large_prefix" which increases this limit to 3072 bytes but one has to use either COMPRESSED or DYNAMIC row format for that to work. Read here for details on both the row formats.
We chose the first one as we found that it was simply the best solution in our case because reducing the column size in our case wouldn't require too many changes on the application side.