innobackupex吃光root分区

2024-02-29 作者： Fisherworks

Take a look at this if your innobackupex insists on eating up most (or all) of your root partition, as playing hide and seek with you especially.

近两天总收到奇怪的推送，磁盘占用超90%，没采取人为任何措施，又自动降至正常……就很离谱。去翻了一下云监控的磁盘使用率，发现居然是系统盘，而非数据盘（后者占用率不高，多日无明显变化），就更奇怪了。

前者么，不逮住问题发生时的话，根本无从查起；考虑其发生规律，基本跟mysql冷备的时间周期比较吻合。

所以先去看mysql冷备文件的状态，就开始出问题了（截图欠奉）。

每个备份文件的结束时间都很早，从以往常见的上午9~10点提前到了凌晨4~5点。
且文件的大小也不太对劲，降了一半不止（在未删除数据的前提下）。

好办，备份文件会定期rotate但log不会，不删就会一直存，看log就好。

一看不要紧，近两天备份都是失败告终。

..........
240209 05:05:01 >> log scanned up to (58627078203151)
240209 05:05:03 >> log scanned up to (58627086878169)
innobackupex: Error writing file '/tmp/xbtempPr6mT8' (Errcode: 28 - No space left on device)
xtrabackup: Error: write to logfile failed
xtrabackup: Error: xtrabackup_copy_logfile() failed.

..........

240209 05:05:01 >> log scanned up to (58627078203151)

240209 05:05:03 >> log scanned up to (58627086878169)

innobackupex: Error writing file '/tmp/xbtempPr6mT8' (Errcode: 28 - No space left on device)

xtrabackup: Error: write to logfile failed

xtrabackup: Error: xtrabackup_copy_logfile() failed.

这里的 /tmp 就很迷，我印象中所有临时文件都不应该去系统默认的temp目录，索性检查一下。

先看mysql temp设置，没问题。

[root@aqui-wechat db_bak]# cat /etc/my.cnf
[mysqld]
datadir=/stor/db_mysql/mysql
tmpdir=/stor/db_mysql/mysql_tmpdir
socket=xxx.sock
user=mysql
........

[root@aqui-wechat db_bak]# cat /etc/my.cnf

[mysqld]

datadir=/stor/db_mysql/mysql

tmpdir=/stor/db_mysql/mysql_tmpdir

socket=xxx.sock

user=mysql

........

再看备份脚本的目录设定，也没问题。

[root@aqui-wechat db_bak]# cat /home/mysql_full_backup_v2.sh
#!/bin/bash
mysql_backup_dir=/stor/db_bak/
mysql_username="xxx"
mysql_password="xxx"
mysql_socket="xxx.sock"

cd $mysql_backup_dir
timeStart=$(date '+%Y%m%d%H%M%S')
logfile=full-$timeStart.log
bak_gz_file=bak-$timeStart.tar.gz
touch $logfile
echo "Start-Time ：$timeStart" | tee -a $logfile
echo "+++++++++++++++" | tee -a $logfile
innobackupex --defaults-file=/etc/my.cnf --user=$mysql_username --password=$mysql_password --socket=$mysql_socket --stream=tar $mysql_backup_dir 2>> $logfile | gzip > $bak_gz_file
............

[root@aqui-wechat db_bak]# cat /home/mysql_full_backup_v2.sh

#!/bin/bash

mysql_backup_dir=/stor/db_bak/

mysql_username="xxx"

mysql_password="xxx"

mysql_socket="xxx.sock"

cd $mysql_backup_dir

timeStart=$(date '+%Y%m%d%H%M%S')

logfile=full-$timeStart.log

bak_gz_file=bak-$timeStart.tar.gz

touch $logfile

echo "Start-Time ：$timeStart" | tee -a $logfile

echo "+++++++++++++++" | tee -a $logfile

innobackupex --defaults-file=/etc/my.cnf --user=$mysql_username --password=$mysql_password --socket=$mysql_socket --stream=tar $mysql_backup_dir 2>> $logfile | gzip > $bak_gz_file

............

既然如此，我就手工跑一下备份脚本试试，看是否在 /tmp 生成了临时文件。除此以外，还在 / 目录下频繁运行 du -sh * 用于检查哪个一级目录在快速增肥。

结果，一无所获。眼看着df返回的 Avail Use% 两个值哗哗往下掉，但 du -sh * 返回除了备份存储在增大以外，别的一级目录包括 /tmp 可谓纹丝不动。

这就过分了，玩我呢！

中英文资料检索了个遍，愣是只找到一个同病相怜的，现象一致，且都是从slave做冷备。本觉得真相即将大白，结果人家遇到了更深的坑，从库挂了，一通操作猛如虎，修好了从库再启动备份，结果这个吃root空间的问题就此消失了。哥们怀疑的 /proc 吃存储，在我看来也比较无稽。

于是又回到了原点。

此时想起了搁置多日的bing chat，于是登门求教，人家已经改名copilot了，无妨。

它提到了4~5种可能性，根据经验我迅速排除了多个，只留两个可能性。inode和deleted but still opened file。

前者在运行 innobackupex 时通过 df -i 检查，一切正常，并无快速增长；至于后者么， lsof 我怎么把这么好的东西忘了，还需要GPT来帮我整理思路……

这一查不要紧，我了个乖乖：

22GB的临时文件。
随着 innobackupex 持续运行，文件仍在增大。
同时，文件处于删除状态，也即open文件后保持写入同时删除。
一旦中止 innobackupex 运行，磁盘占用随即释放，绝不拖泥带水。

[root@aqui-wechat /]# lsof -c innobackupex | grep 'tmp'
innobacku 2398 root    5u   REG              253,0 23622321345 111149066 /tmp/xbtempMTBMsa (deleted)
[root@aqui-wechat /]# python -c 'print(23622321345 / (1024 ** 3))'
22

[root@aqui-wechat /]# lsof -c innobackupex | grep 'tmp'

innobacku 2398 root 5u REG 253,0 23622321345 111149066 /tmp/xbtempMTBMsa (deleted)

[root@aqui-wechat /]# python -c 'print(23622321345 / (1024 ** 3))'

不是，哥们，这Percona xtrabackup开发团队在洗白前是开发啥不明不白binary出身的吧？

好，元凶找到了，GPT也给不出啥好办法了，无非查源码或者升级版本之类的，所以还得靠自己了。

虽然没法解掉这个bug，我估计上 bind mount 至少能缓解问题，把 /tmp 挂载到数据盘上应该就行。打开 /etc/fstab 加上这句，然后 mount -a 执行。

# to solve the i-dont-know-why innobackupex insists to consume '/tmp' even mysql temp dir was already reconfed to somewhere else
/stor/rep_sys_temp /tmp none bind 0 0

1 2	# to solve the i-dont-know-why innobackupex insists to consume '/tmp' even mysql temp dir was already reconfed to somewhere else /stor/rep_sys_temp /tmp none bind 0 0

确认挂载成功。

[root@aqui-wechat /]# findmnt | grep "/tmp"
├─/tmp      /dev/mapper/data-stor[/rep_sys_temp] ext4     rw,relatime,barrier=1,data=ordered

1 2	[root@aqui-wechat /]# findmnt \| grep "/tmp" ├─/tmp /dev/mapper/data-stor[/rep_sys_temp] ext4 rw,relatime,barrier=1,data=ordered

重新开始执行 innobackupex 做DB全量冷备，root分区两个值 Avail Use% 不再变化。半天左右备份完成，备份log正常结束，问题缓解成功。

原创文章，转载请注明： 转载自渔人小径

本文链接地址: innobackupex吃光root分区

打赏 PayPal

文章的脚注信息由WordPress的wp-posturl插件自动生成

打赏赞(0)

innobackupex吃光root分区

最近文章

标签

分享

发表评论取消回复

关于Fisherworks

最近文章

功能

本站友链

2025年十月
一	二	三	四	五	六	日
« 9月
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

innobackupex吃光root分区

最近文章

标签

分享

发表评论 取消回复

关于Fisherworks

最近文章

功能

本站友链

发表评论取消回复