Docker下MySQL迁移问题

因为ext4文件系统的 “百分百占用io” bug,严重影响了系统的稳定,索性一不做二不休,直接把文件系统换为当前CentOS发行版默认使用的xfs文件系统。

ext4文件系统bug,bug原理大致是,文件的写和请求会导致其中一个int型的值不断增大,最后增大到超出了自身的范围 —— 变成负值,就会触发该bug ,然后jpd2进程会无限占用磁盘io,而想要达到该值并不容易,需要几个月后才会出现,但是我的服务器只运行了不到半个月,然后在构建一个docker镜像的过程中,这个bug就触发了,cpu load 直接上百,百思不得其解

我把数据先备份到别的地方,服务器重装好后再拷贝回来,MySQL是用docker跑的,目录还原回来理论上就应该可以用了,但是不尽人意,报了如下错误:

2015-09-25 02:56:43 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).  
2015-09-25 02:56:43 0 [Note] mysqld (mysqld 5.6.26) starting as process 167 ...  
2015-09-25 02:56:43 167 [Note] Plugin 'FEDERATED' is disabled.  
mysqld: Can't find file: './mysql/plugin.frm' (errno: 13 - Permission denied)  
2015-09-25 02:56:43 167 [ERROR] Can't open the mysql.plugin table. Please run mysql_upgrade to create it.  
2015-09-25 02:56:43 167 [Note] InnoDB: Using atomics to ref count buffer pool pages  
2015-09-25 02:56:43 167 [Note] InnoDB: The InnoDB memory heap is disabled  
2015-09-25 02:56:43 167 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins  
2015-09-25 02:56:43 167 [Note] InnoDB: Memory barrier is not used  
2015-09-25 02:56:43 167 [Note] InnoDB: Compressed tables use zlib 1.2.3  
2015-09-25 02:56:43 167 [Note] InnoDB: Using Linux native AIO  
2015-09-25 02:56:43 167 [Note] InnoDB: Using CPU crc32 instructions  
2015-09-25 02:56:43 167 [Note] InnoDB: Initializing buffer pool, size = 128.0M  
2015-09-25 02:56:43 167 [Note] InnoDB: Completed initialization of buffer pool  
2015-09-25 02:56:43 167 [ERROR] InnoDB: ./ibdata1 can't be opened in read-write mode  
2015-09-25 02:56:43 167 [ERROR] InnoDB: The system tablespace must be writable!  
2015-09-25 02:56:43 167 [ERROR] Plugin 'InnoDB' init function returned error.  
2015-09-25 02:56:43 167 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.  
2015-09-25 02:56:43 167 [ERROR] Unknown/unsupported storage engine: InnoDB  
2015-09-25 02:56:43 167 [ERROR] Aborting  
。。。。。

按照报错提示和Google的结果,就是把 ib_logfile0和 ib_logfile1 这两个文件删掉即可,然而新的问题又出现了:

...
InnoDB: Error: page 570 log sequence number 7289495  
InnoDB: is in the future! Current system log sequence number 5574939.  
InnoDB: Your database may be corrupt or you may have copied the InnoDB  
InnoDB: tablespace but not the InnoDB log files. See  
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html  
InnoDB: for more information.  

再次Google这个error,发现了这样的一篇博文 MySQL log is in the future!,这篇博文的方法简单易懂,但是首要问题是我的mysql无法启动,无法启动就不能备份数据。同时也看了下log中提到的 Forcing InnoDB Recovery。

Forcing InnoDB Recovery提供了6个等级的修复模式,需要注意的是值大于3的时候,会对数据文件造成永久的破坏,不可恢复。六个等级的介绍摘抄如下:

  • 1 (SRV_FORCE_IGNORE_CORRUPT)
    Lets the server run even if it detects a corrupt page. Tries to make SELECT * FROM tbl_name jump over corrupt index records and pages, which helps in dumping tables.

  • 2 (SRV_FORCE_NO_BACKGROUND)
    Prevents the master thread and any purge threads from running. If a crash would occur during the purge operation, this recovery value prevents it.

  • 3 (SRV_FORCE_NO_TRX_UNDO)
    Does not run transaction rollbacks after crash recovery.

  • 4 (SRV_FORCE_NO_IBUF_MERGE)
    Prevents insert buffer merge operations. If they would cause a crash, does not do them. Does not calculate table statistics. This value can permanently corrupt data files. After using this value, be prepared to drop and recreate all secondary indexes.

  • 5 (SRV_FORCE_NO_UNDO_LOG_SCAN)
    Does not look at undo logs when starting the database: InnoDB treats even incomplete transactions as committed. This value can permanently corrupt data files.

  • 6 (SRV_FORCE_NO_LOG_REDO)
    Does not do the redo log roll-forward in connection with recovery. This value can permanently corrupt data files. Leaves database pages in an obsolete state, which in turn may introduce more corruption into B-trees and other database structures.

使用方法如下,在mysql配置文件中,添加或修改以下配置的值

my.cnf

[mysqld]
innodb_force_recovery = 1  

根据查到的博文提到的方法,修复步骤如下:

因为我无法启动MySQL,所以首先要想办法启动MySQL,然后dump数据。从innodb_force_recovery的值1开始尝试,看MySQL能否在该修复模式下启动,不到万不得已,不要尝试值为4及以上。

在我这里,MySQL在值为3时可以启动,但是第一次启动后,在dump数据时,进程又自己退出了,心凉了一半,不是要我上4吧,还好再一次以3启动,这次能dump出数据了:

mysqldump -h 127.0.0.1 -P 3306 -u root -p --all-databases > all-databases.sql  

删除(或者备份)掉出错的数据文件:

rm ib_logfile0  
rm ib_logfile1  

启动mysql,然后从备份文件恢复数据

docker-compose up  
mysql -h 127.0.0.1 -P 3306 -u root -p < all-databases.sql  

因为在修复模式下,在插入数据时报错,也就是说此时是不能写入数据的。所以就关闭掉了修复模式:

[mysqld]
innodb_force_recovery = 0  

restart mysql后,再次恢复数据:

docker-compose restart mysql  
mysql -h 127.0.0.1 -P 3306 -u root -p < all-databases.sql  

导入完成后,再次重启下mysql,现在mysql可以正常启动了,并且数据也恢复成功。

Docker的数据库怎么说,没必要的时候不要随便移动,权限问题会搞得你头大,还有我上边遇到的问题也是,所以重启Docker服务器前请千万要先正常停止数据库的container,然后再关闭服务器。如果有迁移的需要,请先dump出数据再尝试迁移,运气好什么问题没有,运气不好就像我上边一样修复吧。如果你有大量的数据而且非常重要,请慎重使用Docker来运行数据库。

更早的文章

记一次Gitlab升级过程

现在公司使用gitlab作为版本管理工具,由于安装时间早,停留在了6.3.1 bitnami版本,ssh协议貌似是废的,只能走http。虽然说日常拉取和提交代码都没什么问题,但是近期出现了不稳定的现象…

gitlab, ruby, python, db继续阅读