PDA

View Full Version : Coping with mixed InnoDB and MyISAM tables



coling
May 30th, 2007, 04:53 AM
I've recently run into a problem that I didn't know existed in my database setup. I used to use a mysqldump to do the backups using the --single-transaction and --flush-logs arguments as most of the tables are in innodb format. Unfortunately when using this technique (combined with the binlogs) to migrate to a new hosting platform a problem was identified to me.

Basically as MyISAM data is not included in the single-transaction, there is an inconsistency in that the data in the MyISAM tables may also be present in the binary log, making any recovery using the dump+binlog potentially flawed due to duplicate keys in myisam tables etc.

I was optimistic that ZRM would have techniques to get around this limitation (which is also covered by a MySQL wish: http://bugs.mysql.com/bug.php?id=23344), however upon testing I see that it too is afflicted with this problem.

In my mind there are several ways to make this work successfully and I'll outline what I think is necessary below.

First of all, let's assume that we can ONLY use logical backups (snapshots etc. are not available). Let's also assume that binary logging is ON.

One solution is to analyse your schema and create multiple backup sets that separate out multiple databases and table types etc. Some of the MyISAM dumps can then be configured with locking etc. to ensure consistency. The disadvantage here is that this can be pretty complex and time consuming for the user and also used undesirable locking.

Another solution is to:
1. Analyse all tables that will be backed up and enumerate transactional (innodb) and non-transactional (myisam) table types.
2. Flush logs
3. Dump all transactional tables with --single-transaction.
4. Lock and dump each non-transactional table in turn. Flushing logs at each iteration - keeping track of from which log file subsequent data changes from each table can be present in. The short lock time (e.g. per table) will keep locks to a minimum but will obviously increase the number of binlog files (one per non-transactional table!)
5. Flush logs
6. Parse all log files and extract data from each and append to the backup. We know from the data recorded in 4, which non-transactional table changes are relevent. All changes to transactional tables are relevant.

If the above process is followed, we know we have a good, complete backup of all data, binary log for subsequent changes and locking for non-transactional tables is kept to a bare minimum.

If ZRM could do this I would love you for ever!!

Variations on the above scheme could be added to deal with different environments e.g. rather than lock a table at a time, lock a whole db etc. This could make things easier but would have longer locks etc. You could also implement a release gap - e.g. a timeout after releasing a lock to wait before acquiring another lock - this would just be to allow the live application to "catch up" etc. etc.

What do you think? Is this possible, achievable in the foreseeable future?

Col