1

In a 64 bit program the selector:offset used to get the stack protector is fs:0x28, where fs=0. This poses no problem because in 64 bit we have the MSR fs_base (which is set to point to the TLS) and the GDT is completely ignored.

But with 32 bit program the stack protector is read from gs:0x14. Running over a 64 bit system we have gs=0x63, on a 32 bit system gs=0x33. Here there are no MSRs because they were introduced in x86_64, so the GDT plays an important role here.

Dissecting this values we get for both cases a RPL=3 (which was expected), the descriptor table selector indicates GDT (LDT is not used in linux) and the selector points to the entry with index 12 for 64 bits and index 6 for 32 bits.

Using a kernel module I was able to check that this entry in 64-bit linux is NULL! So I don't understand how the address of the TLS is resolved.

The relevant part of the kernel module is the following:

void gdtread()
{
    struct desc_ptr gdtr;
    seg_descriptor* gdt_entry = NULL;
    uint16_t tr;
    int i;

    asm("str %0" : "=m"(tr));

    native_store_gdt(&gdtr); // equiv. to asm("sgdt %0" : "=m"(gdtr));
    printk("GDT address: 0x%px, GDT size: %d bytes = %i entries\n",
           (void*)gdtr.address, gdtr.size + 1, (gdtr.size + 1) / 8);

    gdt_entry = (seg_descriptor*)gdtr.address;
    for(i = 0; i < (gdtr.size + 1) / 8; i++)
    {
        if(tr >> 3 == i)
            printk("Entry #%i:\t<--- TSS (RPL = %i)", i, tr & 3);
        else
            printk("Entry #%i:", i);

        if(!((uint64_t*)gdt_entry)[i])
        {
            printk("\tNULL");
            continue;
        }
        
        if(gdt_entry[i].s)
            user_segment_desc(&gdt_entry[i]);
        else
            system_segment_desc((sys_seg_descriptor*)&gdt_entry[i++]);
    }
}

Which outputs the following on a 64-bit system:

[ 3817.191065] GDT address: 0xfffffe0000001000, GDT size: 128 bytes = 16 entries
[ 3817.191073] Entry #0:
[ 3817.191075]  NULL
[ 3817.191078] Entry #1:
[ 3817.191081]  Raw: 0x00cf9b000000ffff
[ 3817.191084]  Base: 0x00000000
[ 3817.191088]  Limit: 0xfffff
[ 3817.191091]  Flags: 0xc09b
[ 3817.191096]      Type = 0xb (Code, non conforming, readable, accessed)
[ 3817.191100]      S    = 0 (user)
[ 3817.191103]      DPL  = 0
[ 3817.191105]      P    = 1 (present)
[ 3817.191109]      AVL  = 0
[ 3817.191112]      L    = 0 (legacy mode)
[ 3817.191115]      D/B  = 1
[ 3817.191118]      G    = 1 (KiB)
[ 3817.191121] Entry #2:
[ 3817.191124]  Raw: 0x00af9b000000ffff
[ 3817.191127]  Base: 0x00000000
[ 3817.191130]  Limit: 0xfffff
[ 3817.191133]  Flags: 0xa09b
[ 3817.191137]      Type = 0xb (Code, non conforming, readable, accessed)
[ 3817.191141]      S    = 0 (user)
[ 3817.191144]      DPL  = 0
[ 3817.191146]      P    = 1 (present)
[ 3817.191149]      AVL  = 0
[ 3817.191152]      L    = 1 (long mode)
[ 3817.191155]      D/B  = 0
[ 3817.191157]      G    = 1 (KiB)
[ 3817.191160] Entry #3:
[ 3817.191163]  Raw: 0x00cf93000000ffff
[ 3817.191166]  Base: 0x00000000
[ 3817.191169]  Limit: 0xfffff
[ 3817.191171]  Flags: 0xc093
[ 3817.191175]      Type = 0x3 (Data, expand down, writable, accessed)
[ 3817.191178]      S    = 0 (user)
[ 3817.191181]      DPL  = 0
[ 3817.191183]      P    = 1 (present)
[ 3817.191186]      AVL  = 0
[ 3817.191189]      L    = 0
[ 3817.191191]      D/B  = 1
[ 3817.191194]      G    = 1 (KiB)
[ 3817.191197] Entry #4:
[ 3817.191199]  Raw: 0x00cffb000000ffff
[ 3817.191202]  Base: 0x00000000
[ 3817.191205]  Limit: 0xfffff
[ 3817.191207]  Flags: 0xc0fb
[ 3817.191211]      Type = 0xb (Code, non conforming, readable, accessed)
[ 3817.191214]      S    = 0 (user)
[ 3817.191217]      DPL  = 3
[ 3817.191219]      P    = 1 (present)
[ 3817.191222]      AVL  = 0
[ 3817.191224]      L    = 0 (legacy mode)
[ 3817.191227]      D/B  = 1
[ 3817.191230]      G    = 1 (KiB)
[ 3817.191233] Entry #5:
[ 3817.191235]  Raw: 0x00cff3000000ffff
[ 3817.191238]  Base: 0x00000000
[ 3817.191241]  Limit: 0xfffff
[ 3817.191243]  Flags: 0xc0f3
[ 3817.191246]      Type = 0x3 (Data, expand down, writable, accessed)
[ 3817.191250]      S    = 0 (user)
[ 3817.191252]      DPL  = 3
[ 3817.191255]      P    = 1 (present)
[ 3817.191258]      AVL  = 0
[ 3817.191260]      L    = 0
[ 3817.191262]      D/B  = 1
[ 3817.191265]      G    = 1 (KiB)
[ 3817.191268] Entry #6:
[ 3817.191270]  Raw: 0x00affb000000ffff
[ 3817.191273]  Base: 0x00000000
[ 3817.191276]  Limit: 0xfffff
[ 3817.191278]  Flags: 0xa0fb
[ 3817.191281]      Type = 0xb (Code, non conforming, readable, accessed)
[ 3817.191284]      S    = 0 (user)
[ 3817.191287]      DPL  = 3
[ 3817.191289]      P    = 1 (present)
[ 3817.191292]      AVL  = 0
[ 3817.191295]      L    = 1 (long mode)
[ 3817.191298]      D/B  = 0
[ 3817.191300]      G    = 1 (KiB)
[ 3817.191303] Entry #7:
[ 3817.191306]  NULL
[ 3817.191308] Entry #8:    <--- TSS (RPL = 0)
[ 3817.191312]  Raw: 0x00000000fffffe0000008b0030004087
[ 3817.191316]  Base: 0xfffffe0000003000
[ 3817.191321]  Limit: 0x04087
[ 3817.191324]  Flags: 0x008b
[ 3817.191327]      Type = 0xb (Busy 64-bit TSS)
[ 3817.191331]      S    = 1 (system)
[ 3817.191333]      DPL  = 0
[ 3817.191336]      P    = 1 (present)
[ 3817.191339]      AVL  = 0
[ 3817.191341]      L    = 0
[ 3817.191344]      D/B  = 0
[ 3817.191347]      G    = 0 (B)
[ 3817.191349] Entry #10:
[ 3817.191352]  NULL
[ 3817.191355] Entry #11:
[ 3817.191358]  NULL
[ 3817.191360] Entry #12:
[ 3817.191362]  NULL
[ 3817.191365] Entry #13:
[ 3817.191367]  NULL
[ 3817.191369] Entry #14:
[ 3817.191372]  NULL
[ 3817.191374] Entry #15:
[ 3817.191377]  Raw: 0x0040f50000000000
[ 3817.191380]  Base: 0x00000000
[ 3817.191382]  Limit: 0x00000
[ 3817.191385]  Flags: 0x40f5
[ 3817.191389]      Type = 0x5 (Data, expand up, read only, accessed)
[ 3817.191392]      S    = 0 (user)
[ 3817.191395]      DPL  = 3
[ 3817.191397]      P    = 1 (present)
[ 3817.191400]      AVL  = 0
[ 3817.191403]      L    = 0
[ 3817.191405]      D/B  = 1
[ 3817.191408]      G    = 0 (B)

I haven't tried this module on a 32 bit system yet, but I'm on my way.

So, to make the question clear: how does the gs segment selector work in a 32-bit program running on a 64-bit linux kernel?

Arget
  • 86
  • 8
  • 1
    I'm pretty sure a 64-bit kernel can use the MSR (or `wrgsbase`) in 64-bit mode before returning to user-space (or entering it for the first time). So you'd only have to mess around with the GDT in a 32-bit kernel. – Peter Cordes Dec 10 '21 at 14:02
  • @PeterCordes, when I tried `i r $gs_base` in gdb while debugging a 32 bit program I got "invalid register", so I assumed they weren't accessible in legacy mode. But your comment made me search in the documentation, specifically the "AMD64 Architecture Programmer's Manual, vol. 2", and in page 27 says _Compatibility mode ignores the high 32 bits of base address in the FS and GS segment descriptors when calculating an effective address._ This implies that in fact the base address registers of GS and FS are also used in legacy mode. – Arget Dec 10 '21 at 14:15
  • In 64 bit mode, the segment selectors are not used. Instead, a special MSR is used to determine the segment bases for FS and GS. – fuz Dec 10 '21 at 14:37
  • @PeterCordes, indeed! I modified the kernel module to read the gs_base of a 32 bit program and yes, it point to the TLS, its contents were `0x00000000f7f84040`. – Arget Dec 10 '21 at 14:44
  • @fuz I'd like you to read the post again and see that I already had that knowledge... – Arget Dec 10 '21 at 14:45
  • Huh, I didn't know GDB could query the kernel for the fs and gs bases, or show them to you that way in a 64-bit process. A user-space process can't read its own fs or gs base directly (without help from the kernel) unless it's in 64-bit mode and the FSGSBASE CPU feature is present (IvyBridge and later IIRC) and enabled by the kernel (only very recent Linux, like the past year), then you can `rdfsbase rax` or whatever. For some reason that opcode isn't supported in 32-bit mode, IDK why they want to stop 32-bit from using efficient segment-base access via MSR or instructions. – Peter Cordes Dec 10 '21 at 14:56
  • @PeterCordes, well, the `rdgsbase` instruction can be executed in user-space. See [this](https://www.kernel.org/doc/html/latest/x86/x86_64/fsgs.html#accessing-fs-gs-base-with-the-fsgsbase-instructions), and I just tried and it works. Of course, it has to be in a 64-bit program, since this instruction doesn't exist in 32-bit. – Arget Dec 10 '21 at 15:26
  • @Arget: Yeah, that's what I said. Recent Linux kernels do enable it by setting `CR4.FSGSBASE[bit 16] = 1` [otherwise it would still fault even on CPUs that support it](https://www.felixcloutier.com/x86/rdfsbase:rdgsbase#64-bit-mode-exceptions). When the programming model for Linux's TLS was designed, there was no such extension; at that time x86 had no way for user-space to read segment bases. (So that's why I'm surprised that the MSRs AMD added for reading/writing segment bases didn't work in 32-bit mode, to let 32-bit kernels read instead of having to know what was in the GDT on last write) – Peter Cordes Dec 10 '21 at 15:36
  • If I understand the code in `tls.c` correctly, Linux modifies the GDT during scheduling. Maybe Linux overwrote the corresponding entry before entering the kernel driver... – Martin Rosenau Dec 10 '21 at 15:44
  • @PeterCordes Oh sorry, my fault, didn't read carefully enough and yes, I knew about CR4.FSGSBASE bit. But actually gdb never queries the kernel to obtain the gs_base; as I said earlier, querying gdb with `i r $gs_base` while debugging a 32-bit program produces a "invalid register" message, and when debugging a 64-bit program it must use `rdgsbase` (because `strace` doesn't show any `arch_prctl()` call). – Arget Dec 10 '21 at 15:45
  • 1
    Do you see GDB using `ptrace(PTRACE_POKETEXT)` to insert an `rdgsbase` into the target process and single-stepping it? That's about as unlikely as making system calls in the context of the guest thread. (Although `strace -f` wouldn't be able to show that; a process can only be traced by one thing at once). More likely the kernel makes segment-base registers available via `ptrace` as part of the thread's architectural state that GDB can read via `ptrace(PTRACE_PEEKUSER)` or whatever it actually uses. Maybe only for 64-bit processes, if the 32-bit ABI didn't include those struct fields? – Peter Cordes Dec 10 '21 at 15:49
  • @PeterCordes, hmm, yes, you are right. – Arget Dec 10 '21 at 20:41

1 Answers1

1

After the comment of @PeterCordes I searched in the "AMD64 Architecture Programmer's Manual, vol. 2", where in page 27 says:

Compatibility mode ignores the high 32 bits of base address in the FS and GS segment descriptors when calculating an effective address.

This implies that a 64-bit kernel managing a 32-bit process uses the MSR_*S_BASE registers as it would for a 64-bit process. The kernel can set the segment bases normally while in 64-bit long mode, so it doesn't matter whether or not those MSRs are available in 32-bit compatibility sub-mode of long mode, or in pure 32-bit protected mode (legacy mode, 32-bit kernel). A 64-bit Linux kernel only uses compat mode for ring 3 (user-space) where wrmsr and rdmsr aren't available because of privileges. As always, segment-base settings persist across changes to privilege level, like returning to user-space with sysret or iret.

Another thing that made me think that this registers weren't used for compatibility-mode processes was GDB. This is what happens when trying to print this register while debugging a 32-bit program.:

(gdb) i r $gs_base
Invalid register `gs_base'

Debugging a 64-bit program it works fine.

(gdb) i r $fs_base
fs_base        0x7ffff7d00c00      0x7ffff7d00c00

Since the instruction rdgsbase is a 64-bit instruction (trying to execute that opcode in a program 32-bit yields a SIGILL signal), it is a bit tricky to obtain the value of this registers within a 32-bit program.

The first solution I thought was to read it from a kernel module:

unsigned long gs_base = 0xdeadbeefc0ffee13;
asm("swapgs;"
    "rdgsbase %0;"    // maybe unsafe if an interrupt happens here
                      // be careful if using this for anything more than toy experiments.
    "swapgs;"
    : "=r"(gs_base));
printk("gs_base: 0x%016lx", gs_base);

So I created a driver for a device in /dev, so when a program open()s that file the code above is executed. After compiling and running a 32-bit program that opens this file I got this

[10793.682033] gs_base: 0x00000000f7f9f040

And using gdb to inspect 0xf7f9f040+0x14 I saw the canary, meaning that it was the TLS.

(gdb) x/wx 0xf7f9f040+0x14
0xf7f9f054: 0x21f03c00
(gdb) x/wx $ebp-0xc
0xbffff60c: 0x21f03c00

The other way I can think of is to perform a far call to change from 32-bit to 64-bit, execute rdgsbase and then return to 64-bit. Probably this is a better solution since it doesn't need a kernel module. (As long as you can assume you're running on a CPU that supports the FSGSBASE extension, and a new enough kernel to enable it.)

Something like this:

#include <stdio.h>

__attribute__((naked))   // or define the function in an asm statement at global scope
extern void rdgsbase()
{
    asm("rdgsbase %eax; retf");
}

int main()
{
    unsigned int* gs_base = NULL;
    unsigned int canary;
    // would be unsafe in a leaf function: clobbers the red zone
    asm("lcall $0x33, $rdgsbase; mov %%eax, %0" : "=m"(gs_base) : : "eax");
    asm("mov %%gs:0x14, %%eax ; mov %%eax, %0" : "=m"(canary) : : "eax");
    printf("gs_base = %p\n", gs_base);
    printf("canary: 0x%08x\n", canary);
    printf("canary: 0x%08x\n", gs_base[5]);
}

I know it is very very dirty and ugly, but it works.

$ gcc gs_base.c -o gs_base -m32
/usr/bin/ld: /tmp/ccAPoxwj.o: warning: relocation against `rdgsbase' in read-only section `.text'
/usr/bin/ld: warning: creating DT_TEXTREL in a PIE

$ ./gs_base 
gs_base = 0xf7f80040
canary: 0x59511d00
canary: 0x59511d00

In a 32-bit system the gs segment selector has the value 0x33, this points to the 7th entry in the GDT (index 6). So let's see what is in there.

Using the same module I shown in the OP (with only minor modifications) I printed the GDT used during the execution of a specific process. This is the entry with index 6:

[ 3579.535005] Entry #6:
[ 3579.535007]  Raw: 0xd100ffff
[ 3579.535009]  Base: 0xb7fcd100
[ 3579.535011]  Limit: 0xfffff
[ 3579.535013]  Flags: 0xd0f3
[ 3579.535018]      Type = 0x3 (Data, expand down, writable, accessed)
[ 3579.535019]      S    = 0 (user)
[ 3579.535021]      DPL  = 3
[ 3579.535023]      P    = 1 (present)
[ 3579.535025]      AVL  = 1
[ 3579.535027]      L    = 0
[ 3579.535028]      D/B  = 1
[ 3579.535030]      G    = 1 (KiB)

In gdb we can verify that it coincides with the TLS of said process:

(gdb) x/wx $ebp-0xc
0xbffff60c: 0xa6e29800
(gdb) x/wx 0xb7fcd100+0x14
0xb7fcd114: 0xa6e29800

Using strace we can see how the 32-bit glibc sets the gs on a 64-bit system:

set_thread_area({entry_number=-1, base_addr=0xf7ebb040, limit=0x0fffff, seg_32bit=1, contents=0, read_exec_only=0, limit_in_pages=1, seg_not_present=0, useable=1}) = 0 (entry_number=12)

This syscall performs in the kernel the setup of the MSR_GS_BASE with the value specified in the argument base_addr. The kernel also places the value 0x63 in the gs register, which points to the entry with index 12, a NULL entry.

On a 32-bit system the syscall is exactly the same

set_thread_area({entry_number=-1, base_addr=0xb7f66100, limit=0x0fffff, seg_32bit=1, contents=0, read_exec_only=0, limit_in_pages=1, seg_not_present=0, useable=1}) = 0 (entry_number=6)

But here, on a 32-bit kernel (which doesn't know anything about MSR_GS_BASE) the gs register gets the value 0x33, pointing to the index 6 in the GDT. Since there is no MSR_GS_BASE now is the GDT entry the one that is setup, with base address and limit fields (and rest of fields) equal to the ones specified in the arguments.

On the other hand, the 64-bit glibc uses the syscall arch_prctl(ARCH_SET_FS, 0x...) to set the value of MSR_FS_BASE. This syscall is only available for 64-bit programs.

The only thing that I don't quite understand yet is why set gs=0x63 instead of 0 or 0x2b (the value of ss, ds and es)...

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Arget
  • 86
  • 8
  • 1
    *but they also exist in legacy mode (32-bit).* - you just said the MSRs *don't* exist in protected mode. 32-bit protected mode is a sub-mode of legacy mode. 32-bit compat mode is a sub-mode of long mode. https://en.wikipedia.org/wiki/X86-64#Operating_modes. What that quote implies is that a 64-bit kernel can use the MSRs *before* switching to compat mode user-space. User-space doesn't use the MSRs, just the actual segment bases which were set via any method, including MSR or loading from the GDT. – Peter Cordes Dec 10 '21 at 20:42
  • Your inline asm isn't safe unless you compile with `-mno-red-zone`. The compiler could be keeping locals right below RSP. [Inline assembly that clobbers the red zone](https://stackoverflow.com/q/6380992) (Although in this case it won't be because it can see the printf calls so it knows the function is non-leaf, which for current GCC means it happens to choose not to use the red-zone at all.) – Peter Cordes Dec 10 '21 at 20:49
  • @PeterCordes fixed the confusion between compat and legacy modes. Thanks. – Arget Dec 10 '21 at 20:50
  • @PeterCordes that asm code is just a dirty experiment, nothing serious. – Arget Dec 10 '21 at 20:51
  • Also, `swapgs; rdgsbase; swapgs` might be unsafe in a preemptible kernel, if the interrupt handler assumes that current GS is kernelGS. Maybe better to `cli` / `sti` around it, or if it might be run with interrupts already disabled (so you shouldn't blindly re-enable), use one of Linux's existing helper functions to disable/restore the interrupt state. It might also work to just look in the saved user-space context for this task, although if that doesn't get updated on every entry to the kernel, it would be stale if 64-bit user-space had used `wrgsbase`. – Peter Cordes Dec 10 '21 at 20:55
  • I added comments to warn future readers that those aren't safe examples. This is important; inline asm is hard enough without unsafe examples misleading people into thinking something is safe. I also rewrote that early paragraph about MSRs; it doesn't matter whether the MSR itself is accessible through compat mode, only that the segment base setting persists when switching from ring 0 long mode to ring 3 compat mode. – Peter Cordes Dec 10 '21 at 21:13
  • @PeterCordes, oh, I was doing the exact same thing, writing a disclaimer about the unsafe code. Thank you. – Arget Dec 10 '21 at 21:17