Does your dataset consist of InnoDB tables with large BLOB data such that the data is stored in external BLOB pages? Was the dataset created in MySQL version 5.1 and below and without using the InnoDB plugin, or with InnoDB plugin but with MySQL version earlier than 5.1.55? If the answer to both the questions are “YES” then it could very well be that you have a hidden corruption lying around in your dataset. The only way you would be able to find out about the corruption is when you have a crash with InnoDB assertion messages similar to the following:
InnoDB: Serious error! InnoDB is trying to free page 4
InnoDB: though it is already marked as free in the tablespace!
InnoDB: The tablespace free space info is corrupt.
In this post I will summarize what the bug is and how it corrupts the dataset. If you want more details of why and how the corruption manifests itself then you can additionally read the following bug reports:
http://bugs.mysql.com/bug.php?id=55543
http://bugs.mysql.com/bug.php?id=55284
http://bugs.mysql.com/bug.php?id=55981
MySQL BLOB Corruption
The bug involves records that contain BLOBs that are stored off page on external pages. When are BLOBs stored off-page depends on the size of the blob and the row format and sometimes also depends on the size of the rows. Since we are mainly talking about older MySQL versions, so we would only be dealing with COMPACT and REDUNDANT row formats. Basically, InnoDB tries to store the entire BLOB on the same InnoDB page, but if the row size is large such that at least two rows cannot be stored on the page, then InnoDB will store the first 768 bytes from the BLOB on the page where the row is stored and the rest of the BLOB data is stored on external page. The page that contains the row then contains the pointer to the external page so that when InnoDB is reading the row it can lookup the BLOB by following the pointer.
Now coming back to the bug. The bug is caused by certain parts of the InnoDB code that cause the external page holding the BLOB data to get orphaned.
One way the page gets orphaned is when the PRIMARY key column is being updated, but the update is rolled back for some reason. Now as you know that updating the PRIMARY key column would require a delete of the old row and a creation of a new row with the new data. It would also require moving the pointer data pointing to the external BLOB page from the old row to the new row. This moving of the association between the row and the external page from old row to new row was not being done in a consistent and transaction safe way which would cause the association to get lost in case of a transaction rollback.
What was really happening can be seen as a sequence of the below events:
- a. A transaction modifies the PRIMARY key column
- b. Modification of PRIMARY key column causes the creation of a new row with the new data and old row being delete-marked
- c. The association between the old row and the external BLOB page is changed, such that the external BLOB page is now associated with the new row.
- d. The transaction is rolled back. The rollback undoes the changes, but does not change the association of the external BLOB page back to the old row.
Hence, a transaction rollback would cause the BLOB page to get orphaned and freed. Once the page is freed it can be reused by InnoDB for storing BLOB data for other rows. The important thing to note here is that the original row that we tried to modify (but rolled back the changes) still contains a pointer to the external BLOB page. The thing that is lost is the association from InnoDB’s perspective and that is why it considers the page free to be reused.
Now, this issue does not immediately cause any crashes. So you would never really know. A crash would only happen when the row above that we tried to modify (but rolled back the change) is deleted. Marko summarizes when this bug would cause a crash as follows:
“Furthermore, if your tablespace has been created and modified with an old (buggy) version of InnoDB, a BLOB page could be freed and reused for something else. InnoDB would read and deliver the overwritten contents of the BLOB page to queries. The problem would not be detected until you actually delete the row and both the original BLOB page and the reused BLOB page have been freed.”
Is there any other way to test for this kind of corruption? No, there is no way possible. Neither “CHECK TABLE” nor innochecksum validate pointers to BLOB pages in any way.
Below is an excerpt from the MySQL manual:
“CHECK TABLE surveys the index page structure, then surveys each key entry. It does not validate the key pointer to a clustered record or follow the path for BLOB pointers.”
Mitigation
So what is the safest path then? The safest thing to do is to rebuild the dataset by means of dump and reload. We mostly have slaves and secondary masters created using some form of filesystem copy, either a direct copy or such as using XtraBackup. So if a crash does happen due to this bug, since it would most likely happen after a DELETE gets executed, the crash would replicate to the slaves as well, which if created using some form of filesystem copy would crash as well. A crash of the replication hierarchy would be a disaster.
Another thing to be careful about is when running tools that would DELETE rows containing large BLOB values. For example, if you run pt-table-sync on a table with hidden corruption and pt-table-sync has to sync records then there is a chance that could cause a crash. This is because by default pt-table-sync executes REPLACE statements which are internally mapped to DELETE+UPDATE.
Conclusion
So to summarize, if your dataset was created such that it meets either of the following conditions:
a. Dataset created under MySQL version <= 5.1 without using InnoDB plugin, or
b. Dataset created under MySQL version < 5.1.55 and using InnoDB plugin
Then it is a very good idea to rebuild the dataset using dump and reload. The bugs were only fixed in MySQL version 5.1.55 and above and only for the InnoDB plugin.
Comments