1) change text fields (char/varchar/text) format to BLOB to loose coding info
2) change BLOB to varchar/text setting proper coding.
ALTER TABLE `news` CHANGE `body` `body` BLOB NULL DEFAULT NULL; ALTER TABLE `news` CHANGE `body` `body` TEXT CHARACTER SET utf8 NULL DEFAULT NULL;
Description: if you have 2/3-bytes sequences as a one non-ascii character in latin1 field, it means you have unicode (eg. utf8) text there - but for mysql it is still a latin1 string. It can work for some time if you do all your data operations outside database in a programming language and connect to MySQL as a latin1 client. But probably this is not what you would like to have as a long-term solution, e.g. you would like to use SQL for sort/comparisons, any reporting tools so you need to have data properly encoded.
If you do convert directly unicode-in-latin1 to unicode (by single alter table) it would become a 4-byte-mess, as MySQL will take existing utf8 multi-byte sequences as proper latin1 characters. The workaround is to hide coding info for MySQL by going through temporary BLOB step to keep all data untouched.
Be aware: After few several latin1-unicode conversions, I would check in the first step if you do not have a mix of latin1/utf8 coding and perhaps deal with it by conversion in external scripts.
2 comments:
Thanks a lot for this post, this is going to help us a lot
This was such a great tip! Thank you!!
Post a Comment