I know this is a very old thread, but I just want to share some of my discoveries. I reverse engineered some part of the executable that NVCC produce. So I am not sure about the correctness and use at your own risk. I am using cuda 8.0 RC, so I am not sure if other versions changed anything.
The __cuRegisterFatBinary takes a void * as input. It points to the executable and in my example, I got the following.
B1 43 62 46 01 00 00 00 70 15 40 00 00 00 00 00 00 00 00 00 00 00 00 00
The sequence of hex follows the format
struct {
uint32_t magic; // Always 0x466243b1
uint32_t seq; // Sequence number of the cubin
uint64_t ptr; // The pointer to the real cubin
uint64_t data_ptr; // Some pointer related to the data segment
}
So if you follow the address in field ptr, you will be able to find the real fat binary which follows the definition that you can find in fatbinary.h in your cuda include directory. There are some header information. If you search for the next occurrence of 0x7F + 'ELF' (the elf magic), you will be able to extract the cubin file there.