树莓派4B频繁遭遇系统假死
This post talks about the frequently-happened X hang on the combination of newly updated Raspbian (Pi-OS) Buster and Raspberry Pi 4B, which was NEVER experienced on the Pi 3B (not 3B+) and same OS version after 30 + reboot tests.
PS: Pls note that’s X hang, not kernel hang or panic.
Summary: Pi 4B might has compatbilty issue with unbranded display that provides non-standard EDID info, which Pi 3B obviously has no problem to cooperate with.
本文聊一下最近颇为折磨的Pi 4B图形界面假死问题。当然,所谓假死就是X挂了,但kernel活着,从ssh可以很好的登录管理看日志的情况。
硬件是Pi 4B,操作系统是Raspbian (Pi-OS) Buster 升级最新。在上百次重启中,85%都会X hang。但有趣的是,Pi 3B(不是3B+)用同一块TF卡启动,测试30+次重启,无问题。
既然不是kernel hang,那么故障发生时总要看下log,所得如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
➜ ~ cat /var/log/messages | grep -A20 "Tainted" | more Jul 22 18:50:17 aq-display kernel: [ 18.171241] CPU: 2 PID: 523 Comm: Xorg Tainted: G C 5.4.51-v7l+ #1333 Jul 22 18:50:17 aq-display kernel: [ 18.171249] Hardware name: BCM2711 Jul 22 18:50:17 aq-display kernel: [ 18.171256] Backtrace: Jul 22 18:50:17 aq-display kernel: [ 18.171278] [<c020d46c>] (dump_backtrace) from [<c020d768>] (show_stack+0x20/0x24) Jul 22 18:50:17 aq-display kernel: [ 18.171292] r6:d778e000 r5:00000000 r4:c129c8f8 r3:ffd5e7d9 Jul 22 18:50:17 aq-display kernel: [ 18.171311] [<c020d748>] (show_stack) from [<c0a39a44>] (dump_stack+0xe0/0x124) Jul 22 18:50:17 aq-display kernel: [ 18.171330] [<c0a39964>] (dump_stack) from [<c0221c70>] (__warn+0xec/0x104) Jul 22 18:50:17 aq-display kernel: [ 18.171343] r8:0000003f r7:00000009 r6:c0e2a724 r5:00000000 r4:d778fa0c r3:ffd5e7d9 Jul 22 18:50:17 aq-display kernel: [ 18.171359] [<c0221b84>] (__warn) from [<c0221d0c>] (warn_slowpath_fmt+0x84/0xc0) Jul 22 18:50:17 aq-display kernel: [ 18.171371] r9:c0e2a724 r8:0000003f r7:c08a2dec r6:00000009 r5:c0e2a744 r4:c1204f88 Jul 22 18:50:17 aq-display kernel: [ 18.171387] [<c0221c8c>] (warn_slowpath_fmt) from [<c08a2dec>] (rpi_firmware_transaction+0x108/0x128) Jul 22 18:50:17 aq-display kernel: [ 18.171399] r9:ef97f440 r8:00000000 r7:00000000 r6:ffffff92 r5:ef97f440 r4:c1204f88 Jul 22 18:50:17 aq-display kernel: [ 18.171413] [<c08a2ce4>] (rpi_firmware_transaction) from [<c08a2ec8>] (rpi_firmware_property_list+0xbc/0x174) Jul 22 18:50:17 aq-display kernel: [ 18.171423] r7:c1204f88 r6:dec04000 r5:00001000 r4:40000027 Jul 22 18:50:17 aq-display kernel: [ 18.171512] [<c08a2e0c>] (rpi_firmware_property_list) from [<bf351378>] (vc4_fkms_get_edid_block+0x7c/0xb4 [vc4]) Jul 22 18:50:17 aq-display kernel: [ 18.171525] r10:d8162800 r9:00000000 r8:d8162040 r7:d716cb00 r6:00000080 r5:d8160440 Jul 22 18:50:17 aq-display kernel: [ 18.171533] r4:c1204f88 Jul 22 18:50:17 aq-display kernel: [ 18.171792] [<bf3512fc>] (vc4_fkms_get_edid_block [vc4]) from [<bf20bea4>] (drm_do_get_edid+0x70/0x2d4 [drm]) Jul 22 18:50:17 aq-display kernel: [ 18.171804] r9:d8160440 r8:bf36da9c r7:d8160440 r6:bf3512fc r5:00000001 r4:d716cb00 Jul 22 18:50:17 aq-display kernel: [ 18.172054] [<bf20be34>] (drm_do_get_edid [drm]) from [<bf351570>] (vc4_fkms_connector_get_modes+0x54/0xcc [vc4]) Jul 22 18:50:17 aq-display kernel: [ 18.172067] r10:d8162800 r9:fffffffd r8:bf36da9c r7:ef325a00 r6:bf33601c r5:c1204f88 -- Jul 22 19:06:15 aq-display kernel: [ 18.151199] CPU: 1 PID: 523 Comm: Xorg Tainted: G C 5.4.51-v7l+ #1333 Jul 22 19:06:15 aq-display kernel: [ 18.151204] Hardware name: BCM2711 Jul 22 19:06:15 aq-display kernel: [ 18.151208] Backtrace: Jul 22 19:06:15 aq-display kernel: [ 18.151221] [<c020d46c>] (dump_backtrace) from [<c020d768>] (show_stack+0x20/0x24) Jul 22 19:06:15 aq-display kernel: [ 18.151229] r6:d7584000 r5:00000000 r4:c129c8f8 r3:ffd5e7d9 Jul 22 19:06:15 aq-display kernel: [ 18.151241] [<c020d748>] (show_stack) from [<c0a39a44>] (dump_stack+0xe0/0x124) Jul 22 19:06:15 aq-display kernel: [ 18.151253] [<c0a39964>] (dump_stack) from [<c0221c70>] (__warn+0xec/0x104) Jul 22 19:06:15 aq-display kernel: [ 18.151260] r8:0000003f r7:00000009 r6:c0e2a724 r5:00000000 r4:d7585a0c r3:ffd5e7d9 Jul 22 19:06:15 aq-display kernel: [ 18.151269] [<c0221b84>] (__warn) from [<c0221d0c>] (warn_slowpath_fmt+0x84/0xc0) Jul 22 19:06:15 aq-display kernel: [ 18.151276] r9:c0e2a724 r8:0000003f r7:c08a2dec r6:00000009 r5:c0e2a744 r4:c1204f88 Jul 22 19:06:15 aq-display kernel: [ 18.151286] [<c0221c8c>] (warn_slowpath_fmt) from [<c08a2dec>] (rpi_firmware_transaction+0x108/0x128) Jul 22 19:06:15 aq-display kernel: [ 18.151293] r9:ef97f440 r8:00000000 r7:00000000 r6:ffffff92 r5:ef97f440 r4:c1204f88 Jul 22 19:06:15 aq-display kernel: [ 18.151301] [<c08a2ce4>] (rpi_firmware_transaction) from [<c08a2ec8>] (rpi_firmware_property_list+0xbc/0x174) Jul 22 19:06:15 aq-display kernel: [ 18.151308] r7:c1204f88 r6:dec03000 r5:00001000 r4:40000027 Jul 22 19:06:15 aq-display kernel: [ 18.151363] [<c08a2e0c>] (rpi_firmware_property_list) from [<bf445378>] (vc4_fkms_get_edid_block+0x7c/0xb4 [vc4]) Jul 22 19:06:15 aq-display kernel: [ 18.151370] r10:d814d800 r9:00000000 r8:d814dc40 r7:d833de00 r6:00000080 r5:d814c040 Jul 22 19:06:15 aq-display kernel: [ 18.151375] r4:c1204f88 Jul 22 19:06:15 aq-display kernel: [ 18.151531] [<bf4452fc>] (vc4_fkms_get_edid_block [vc4]) from [<bf268ea4>] (drm_do_get_edid+0x70/0x2d4 [drm]) Jul 22 19:06:15 aq-display kernel: [ 18.151539] r9:d814c040 r8:bf461a9c r7:d814c040 r6:bf4452fc r5:00000001 r4:d833de00 Jul 22 19:06:15 aq-display kernel: [ 18.151689] [<bf268e34>] (drm_do_get_edid [drm]) from [<bf445570>] (vc4_fkms_connector_get_modes+0x54/0xcc [vc4]) Jul 22 19:06:15 aq-display kernel: [ 18.151697] r10:d814d800 r9:fffffffd r8:bf461a9c r7:d8a6fd00 r6:bf34601c r5:c1204f88 -- Jul 22 19:16:44 aq-display kernel: [ 16.871109] CPU: 0 PID: 526 Comm: Xorg Tainted: G C 5.4.51-v7l+ #1333 Jul 22 19:16:44 aq-display kernel: [ 16.871112] Hardware name: BCM2711 Jul 22 19:16:44 aq-display kernel: [ 16.871115] Backtrace: Jul 22 19:16:44 aq-display kernel: [ 16.871125] [<c020d46c>] (dump_backtrace) from [<c020d768>] (show_stack+0x20/0x24) Jul 22 19:16:44 aq-display kernel: [ 16.871130] r6:d7f34000 r5:00000000 r4:c129c8f8 r3:ffd5e7d9 Jul 22 19:16:44 aq-display kernel: [ 16.871139] [<c020d748>] (show_stack) from [<c0a39a44>] (dump_stack+0xe0/0x124) |
可以说基本是稳定复现,每次都是Xorg先挂。随后呢,其他进程可能不挂,也可能跟着挂。
就“Xorg Tainted”做了大量的检索,升级OS,升级eeprom都试过,依然无果。只能继续看log继续碰运气。
偶然发现这么一条:
话说带着语法着色,就是有种开卷考试自带答案的属性?
1 2 |
Jul 23 16:56:13 aq-display colord[895]: failed to get edid data: EDID length is too small Jul 23 16:56:14 aq-display colord[895]: failed to get session [pid 388]: 没有可用的数据 |
colord报告拿不到EDID的情况出现在部分的Xorg Tainted traceback之后,不是100%出现,但频率较高。且Xorg Tainted其traceback本身也提到了跟edid相关问题,属100%出现……嗯,会不会是简单的EDID(显示器参数汇报)问题呢?
那么现在Pi 4B上接了个啥显示器呢?
也不怕您笑话,一张办公桌连MBP算在一起,一共4台显示器。对Pi而言,多数时候远程,百八十年才看显示,省空间给了个小尺寸的某宝白牌显示器(灰尘请自动忽略)。
好吧,盲猜这哥们是在假死抱怨合作方业务水平,成,我给你换个业界普遍水平的(显示器)呗?
写个crontab自动重启20+次,尚未发现问题。
1 2 3 |
➜ ~ sudo crontab -l # m h dom mon dow command */10 * * * * /sbin/reboot |
用logrotate强制更新message和syslog,然后连续跑48小时自动重启,上班再查log就心中有数了。
同样是HDMI输出,话说Pi 3B为何完全没有问题呢?有知道的,敬请留言赐教。
文章的脚注信息由WordPress的wp-posturl插件自动生成
我的树莓派 最近也经常出问题,有的时候我执行reboot在重启之前就卡死了,只能断电,排除电源及内存卡问题,因为这两个我都更换过,重装系统也不行。
后来改用了ubuntu镜像就好了。
我用的是最新2021年11月的镜像,这个镜像对4b处理器提高了频率由1.5提高到了1.8G,但是我尝试在有问题的系统中降频还是不行,最后没办法我才换成ubuntu,用了几天很稳定。
举凡Linux死机,不会像Win那样变成玄学,只要花点功夫总能多少找出原因。你可以看一下死机时,网卡是否还能ping通,ssh是否还能连上,如果用raspi-config关掉xfce只用命令行的话,现象还能否重现。此外,死机强制重启后,应查看messages和syslog等系统日志,看有无可疑内容。
我们的业务是批量使用树莓派,4b的最新Raspbian系统我还没上,所以没体会过超频的感觉。就过往数千片使用经历而言,普遍来讲,Pi和Raspbian表现还是蛮稳定的,除了可能发生的单体质量问题,基本很少出现找不到理由的失效。
如果不是偶尔需求GUI,我可能也会考虑下ubuntu。然后前阵子测试pi kvm,发现其实Arch版本与Pi 4的兼容性也做的不错了,需要的话,您也可以考虑。