nop
nop
nop
nop
ldr r0,hello
nop
nop
nop
nop
b .
hello: .word 0x12345678
00000000 <hello-0x28>:
0: e1a00000 nop ; (mov r0, r0)
4: e1a00000 nop ; (mov r0, r0)
8: e1a00000 nop ; (mov r0, r0)
c: e1a00000 nop ; (mov r0, r0)
10: e59f0010 ldr r0, [pc, #16] ; 28 <hello>
14: e1a00000 nop ; (mov r0, r0)
18: e1a00000 nop ; (mov r0, r0)
1c: e1a00000 nop ; (mov r0, r0)
20: e1a00000 nop ; (mov r0, r0)
24: eafffffe b 24 <hello-0x4>
28: 12345678
sounds to me like it is simply poorly written. How does [pc,#16] from address 0x10 result in 0x28? 0x28-0x10 = 0x18 or 24 which is 8 to big, hang on a second...(I new the answer just playing around)
Elsewhere in your ARM documentation it talks about the program counter being two ahead or possibly incorrectly documented as 8 bytes ahead. It is actually two instructions ahead, so in thumb mode 4 bytes, and in arm mode 8 bytes ahead. When thumb2 extensions get involved the lr is sorted out based on the next instruction(s). But it appears that for pc relative stuff it is 4 ahead for thumb (traditionally two instructions ahead) and 8 ahead for arm (two ahead). This doesnt mean the pc is really at this value, maybe arm1 during the acorn days, but now it is purely synthesized as the stack is much deeper.
So when doing the math to compute the immediate you take destination-current instruction-8
so in this case 0x28-0x10-8 - 16.
Note #16 the # just means this is a constant/immediate just like the commas and the brackets, makes parsing a little easier (and there are a few cases where the number without the # means something else at least in general perhaps not for arm gnu assembler).
So as demonstrated above, the immediate value encoded in the instruction is the destination address - pc - 8 in ARM mode.
and as you would hope (but not necessarily expect in some cases) the immediate gets sign extended
00000000 <hello-0xc>:
0: e1a00000 nop ; (mov r0, r0)
4: e1a00000 nop ; (mov r0, r0)
8: e1a00000 nop ; (mov r0, r0)
0000000c <hello>:
c: 12345678 eorsne r5, r4, #120, 12 ; 0x7800000
10: e1a00000 nop ; (mov r0, r0)
14: e1a00000 nop ; (mov r0, r0)
18: e1a00000 nop ; (mov r0, r0)
1c: e1a00000 nop ; (mov r0, r0)
20: e1a00000 nop ; (mov r0, r0)
24: e51f0020 ldr r0, [pc, #-32] ; c <hello>
same math (arm mode)
0xC - 0x24 - 8 = -0x20
00000000 <hello-0x14>:
0: 46c0 nop ; (mov r8, r8)
2: 46c0 nop ; (mov r8, r8)
4: 46c0 nop ; (mov r8, r8)
6: 46c0 nop ; (mov r8, r8)
8: 4802 ldr r0, [pc, #8] ; (14 <hello>)
a: 46c0 nop ; (mov r8, r8)
c: 46c0 nop ; (mov r8, r8)
e: 46c0 nop ; (mov r8, r8)
10: 46c0 nop ; (mov r8, r8)
12: e7fe b.n 12 <hello-0x2>
14: 12345678
to complete the story 0x14 - 0x8 - 4 (thumb mode) = 8
well completing the story would include thumb2 extensions.
00000000 <hello-0x18>:
0: bf00 nop
2: bf00 nop
4: bf00 nop
6: bf00 nop
8: 4803 ldr r0, [pc, #12] ; (18 <hello>)
a: eba0 0001 sub.w r0, r0, r1
e: bf00 nop
10: bf00 nop
12: bf00 nop
14: bf00 nop
16: e7fe b.n 16 <hello-0x2>
00000018 <hello>:
18: 12345678 eorsne r5, r4, #120, 12 ; 0x7800000
okay good so it used 4, when dealing with a prefetch abort is when the size of the instruction comes into plan, not here with a pc relative load, so that is good.
It is when you have a return address is when it correctly does the two ahead in thumb mode.
So what is this doing
ldr r0,[pc,#16]
the processor takes the program counter (synthesized not real) which is the instruction address plus 8, it then adds the immediate value which is marked by the # sign here so if this instruction was at address 0x1234 then it would take 0x1234 + 8 + 16 = 0x124C. That is inside brackets so one level of indirection, so it takes that address 0x124C reads the 32 bit value there (ldr vs ldrb vs ldrh vs ldrd) and then places the result in the specified destination register r0 in this case. In your case you have a label which is just an address, and the instruction will have the right immediate provided by the assembler such that when executed r0 gets the 0xFF00FFFF at that address.
This is called pc relative addressing, very important addressing mode, esp for RISC machines. But also useful for CISC
In this same instruction set
ldr r0,[r1,#16]
Is technically no different whatever value is in r1 plus 16 is the address used, pc is special because "whatever value is in pc" varies based on the mode (arm/thumb) and the address where the instruction is located but for a [r1,#16] r1 wont vary it is whatever you set it to.