I am having a kernel crash issue in ARM64 (or aarch64). My system has 4 cores. I investigated the issue and find out some points.
Here is the code C (linux/drivers/cpuidle/governors/menu.c)
static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
{
struct menu_device *data = this_cpu_ptr(&menu_devices);
int latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY);
Here is code of pm_qos_request (linux/kernel/power/qos.c)
int pm_qos_request(int pm_qos_class)
{
return pm_qos_read_value(pm_qos_array[pm_qos_class]->constraints);
}
In assembly code, the value of pm_qos_class is changed from PM_QOS_CPU_DMA_LATENCY (macro value is 1) to a abnormal value (0x0499162b) and make kernel crashes.
Here is crash log
Unable to handle kernel paging request at virtual address ffffffc025140188
pgd = ffffffc006fd3000
[ffffffc025140188] *pgd=0000000000000000, *pud=0000000000000000
Internal error: Oops: 96000006 [#1] PREEMPT SMP
...
CPU: 2 PID: 0 Comm: swapper/2 Tainted: P 4.1.45 #1
Hardware name: Broadcom-v8A (DT)
task: ffffffc0160d5540 ti: ffffffc0160d8000 task.ti: ffffffc0160d8000
PC is at pm_qos_request+0x8/0x18
LR is at menu_select+0x3c/0x3f8
pc : [<ffffffc0000c9330>] lr : [<ffffffc000302804>] pstate: 800001c5
sp : ffffffc0160dbed0
x29: ffffffc0160dbed0 x28: 0000000000002701
x27: ffffffc0177d97d4 x26: 0000000000000006
x25: 0000000000002076 x24: ffffffc0006b25a0
x23: 0000000017164000 x22: 0000000077359400
x21: ffffffc000675770 x20: ffffffc0177d9770
x19: ffffffc0177d97b4 x18: 0000000000000000
x17: 0000000000000000 x16: ffffffc0000f59e8
x15: 0000000000000000 x14: 00000000f6b86920
x13: 00000000f6b85de0 x12: 0000000000000000
x11: 0000000000000040 x10: 0000000000000000
x9 : 0000000000ffffb7 x8 : 0000000000000037
x7 : ffffffc0177d78b8 x6 : 0000000000000001
x5 : 000000000000270b x4 : 0000000000001b3d
x3 : 0000000000002704 x2 : ffffffc0177d97c8
x1 : ffffffc0004b5030 x0 : 000000000499162b
Let's look into the assembly code of menu_select
(gdb) disassemble menu_select
Dump of assembler code for function menu_select:
0xffffffc0003027c8 <+0>: stp x29, x30, [sp,#-128]!
0xffffffc0003027cc <+4>: mov x29, sp
0xffffffc0003027d0 <+8>: stp x21, x22, [sp,#32]
0xffffffc0003027d4 <+12>: adrp x21, 0xffffffc000675000 <vmstat_work+88>
0xffffffc0003027d8 <+16>: stp x23, x24, [sp,#48]
0xffffffc0003027dc <+20>: stp x19, x20, [sp,#16]
0xffffffc0003027e0 <+24>: mrs x23, tpidr_el1
0xffffffc0003027e4 <+28>: mov x24, x0
0xffffffc0003027e8 <+32>: add x21, x21, #0x770
0xffffffc0003027ec <+36>: mov w0, #0x1 // #1
0xffffffc0003027f0 <+40>: str x1, [x29,#104]
0xffffffc0003027f4 <+44>: add x20, x21, x23
0xffffffc0003027f8 <+48>: stp x25, x26, [sp,#64]
0xffffffc0003027fc <+52>: stp x27, x28, [sp,#80]
0xffffffc000302800 <+56>: **bl 0xffffffc0000c9328 <pm_qos_request>**
0xffffffc000302804 <+60>: mov w22, w0
w0 is set to #1 by the command before coming to pm_qos_request
mov w0, #0x1 // #1
But when crash happen in pm_qos_request, value of w0 is not #1 (it is 0x0499162b from crash log)
Let's the assembly of pm_qos_request
(gdb) disassemble pm_qos_request
Dump of assembler code for function pm_qos_request:
0xffffffc0000c9328 <+0>: adrp x1, 0xffffffc0004b5000 <__func__.12082+8>
0xffffffc0000c932c <+4>: add x1, x1, #0x30
0xffffffc0000c9330 <+8>: ldr x0, [x1,w0,sxtw #3]
0xffffffc0000c9334 <+12>: ldr x0, [x0]
0xffffffc0000c9338 <+16>: ldr w0, [x0,#16]
0xffffffc0000c933c <+20>: ret
Crash point is in 0xffffffc0000c9330. x0 is not used from the starting point of pm_qos_request function so that I am expecting its value is always #1 as input value but it doesn't. It was 0x0499162b from crash log
My question Is there any possibility that w0 (or other generic-registers) is changed randomly as above?
Appreciate any advice.
Thank you in advance.