The “Shadow Table” trick
The need:
Often there is a requirement where data in a particular table has to be processed, and the data processing might be slow, while the table might be a one that is used by your application extensively. For example, a logging table that logs page hits. Or there might be an archiving operation that has to be performed on a particular table.
Archiving / processing / aggregating records, all these operations are slow and can really blog down a website, combine that with the added overhead if the table that needs to have these operations performed is one that many clients will write to.
The solution:
The solution to this problem is what is called a “Shadow Table” trick.
The trick consists of the following steps:
- Create a shadow table that has the same structure as the original table.
- Swap the original table with the shadow table, so that the shadow table now becomes the original table with no records, and the original table becomes the shadow table with all the records that need to be processed. This swapping is achieved through RENAME TABLE hack, because renames in MySQL are atomic and there is no performance lag at all.
- Do the processing on the new shadow table. This would ensure that all processing and read operations are being performed on a table that no other client is querying, hence no blocking whatsoever of any sort.
Implementation:
Suppose we have a hits table defined as followed,
CREATE TABLE `hits` ( `ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `post_id` bigint(20) unsigned NOT NULL, KEY `pp_postid_ts` (`post_id`,`ts`) ) ENGINE=MyISAM;
We would create a shadow table with the following piece of sql:
CREATE TABLE `hits_shadow` LIKE `hits`;
Now whenever you have to process the hits table all you need to do is do the swapping between the hits table and the hits_shadow table using the following rename table hack,
RENAME TABLE hits TO hits_temp, hits_shadow TO hits, hits_temp TO hits_shadow;
Now you can do all your processing and heavy-lifting on the hits_shadow table without locking out any client, as the shadow table is only being used by your processing script.
After doing the processing, don’t forget to truncate the shadow table.
One more thing, did you see how easily with RENAME TABLE we were able to swap between two tables without any kind of a hassle or performance overhead.
Final notes:
I have been using this technique whenever I need to perform some heavy duty processing on a table that is also heavily used by my application. I have also used this technique when I need to archive data. This technique also has an added advantage of ensuring the completeness of the data up to a particular time.