You are given an executable a.out
that takes a single command line argument, a string, and then computes a 32 byte HMAC on the input and prints it to stdout using base64 encoding. Can you retrieve the secret key used to compute the HMAC?
For example:
$ ./a.out foo A283gk/gcX/JA5yo6zNznSNHumIn91RxfCtyfR2rcXQ
You start to inspect the executable and see that it's a 64 bit ELF compiled for aarch64, stripped of debug symbols.
$ file a.out main: ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-musl-aarch64.so.1, stripped
The binary dynamically links to some common, unimportant shared libraries. In particular you can see it dynamically links OpenSSL libcrypto, perhaps for computing the HMAC?
$ ldd a.out linux-vdso.so.1 (0x0000f06f34466000) libcrypto.so.1.1 => /lib/aarch64-linux-gnu/libcrypto.so.1.1 (0x0000f06f34193000) libc.musl-aarch64.so.1 => not found libdl.so.2 => /lib/aarch64-linux-gnu/libdl.so.2 (0x0000f06f3417f000) libpthread.so.0 => /lib/aarch64-linux-gnu/libpthread.so.0 (0x0000f06f3414e000) libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000f06f33fdb000) /lib/ld-musl-aarch64.so.1 => /lib/ld-linux-aarch64.so.1 (0x0000f06f34436000)
If the crypto stack is statically linked (which it usually is in well protected executables) rather than dynamically linked to the system's, you cannot really make educated guesses like that. Instead our first step would be to recognize which cryptography stack is being used - a rather significant endeavour in itself.
Nonetheless I crossed my fingers and hoped that it was dynamically linking OpenSSL libcrypto for the HMAC computation. After some reconnaissance static analysis in Ghidra, I realized that indeed we were interested in the libcrypto HMAC function with signature HMAC(hash_function, &key, 10, src, src_len, out, &out_len)
- whose symbol name didn't appear to have been stripped.
When an executable is stripped of debug symbols, the resulting binary may not contain variable names, function names, line numbers, data types, etc. In this case the executable was stripped of some debug symbols but apparently not the HMAC
function name. Usually an executable will be completely stripped of all debug symbols, in which case we would additionally need to recognize the HMAC function and make note of its obfuscated name.
The function takes a secret key by pointer (together with its integer length), as well as the source and output and their lengths. The key is naturally supplied before the function is called, but it is highly obfuscated (maybe some LLVM-based obfuscation) because reversing the key is the entire point of this challenge. Unfortunately I was short on time and I didn't think I could reverse that scary-looking routine by static analysis.
However I could see the light at the end of the tunnel with a dynamic analysis approach, since the crypto stack was so conveniently dynamically linked and pretty much unstripped. I rushed to load the binary in gdb, set a breakpoint at this HMAC function, and ran the program supplying "foo" as a command line argument.
$ gdb ./a.out (gdb) b HMAC (gdb) run foo
Unfortunately gdb could not provide information about the arguments of the HMAC function because the symbol table is missing.
(gdb) info args No symbol table info available.
However, one can always try to meaningfully print the contents of the CPU registers at a function breakpoint, in this case keeping in mind the 64-bit ARM (AArch64) calling convention. In gdb run:
(gdb) info registers x0 0xffffa69fd738 281473477236536 x1 0xffffd886e430 281474314462256 x2 0xa 10 x3 0xffffd886ef5d 281474314465117 x4 0x3 3 x5 0xffffd886e4f8 281474314462456 x6 0xffffd886e42c 281474314462252 x7 0x5 5 x8 0x20 32 x9 0xf5 245 ...
Recall the 64-bit ARM calling convention:
The 64-bit ARM (AArch64) calling convention allocates the 31 general-purpose registers as:
- x31 (SP): Stack pointer or a zero register, depending on context.
- x30 (LR): Procedure link register, used to return from subroutines.
- ...
- x9 to x15: Local variables, caller saved.
- x8 (XR): Indirect return value address.
- x0 to x7: Argument values passed to and results returned from a subroutine.
Basically x0 will hold the first argument to a function, x1 the second argument, and so on and so forth. When a function has more arguments than the number of general-purpose registers, the remaining arguments are typically passed on the stack. If we focus on gdb again everything should be clear now:
(gdb) p (char*) 0xffffd886e430 $2 = 0xffffd886e430 "1234567890"Well, actually this is it, it seems that the key is the string "1234567890". But we've come all this way so let's check the rest.
(gdb) p (char*) 0xffffd886ef5d $1 = 0xffffd886ef5d "foo"Indeed the pointer in x3 is pointing to the string "foo" that we supplied.
Nice! Everything is consistent. We could stop here and go home. But for completeness let's write an equivalent program (using the same HMAC
routine) that supplies the same key we reversed and double check that we get the same result. I just guessed by trial and error that the underlying hash function used in the HMAC
routine was SHA256.
#include <openssl/hmac.h> #include <stdio.h> #include <string.h> #include <stdlib.h> #include <openssl/bio.h> #include <openssl/evp.h> int main(int argc, char *argv[]) { char key[] = "1234567890"; unsigned char result[32]; unsigned int len; HMAC(EVP_sha256(), key, strlen(key), (unsigned char*) argv[1], strlen(argv[1]), result, &len); // Print as base64 with no line breaks BIO *b64 = BIO_new(BIO_f_base64()); BIO_set_flags(b64, BIO_FLAGS_BASE64_NO_NL); BIO *bio = BIO_new_fp(stdout, BIO_NOCLOSE); // Print BIO_push(b64, bio); BIO_write(b64, result, len); BIO_flush(b64); BIO_free_all(bio); return 0; }
Sure enough we get the same result as the mysterious binary:
$ sudo apt install -y build-essential libssl-dev $ gcc hmac.c -lcrypto $ ./a.out foo A283gk/gcX/JA5yo6zNznSNHumIn91RxfCtyfR2rcXQ=
In conclusion, by taking a dynamic analysis approach and using gdb to inspect the contents of the CPU registers at the HMAC function breakpoint, we were able to retrieve the key. This was achieved by examining the general-purpose registers allocated by the 64-bit ARM calling convention, which helped us identify the registers containing the pointer to the key and its length.
This exercise highlights the importance of understanding the underlying CPU architecture and function calling convention, as well as the usefulness of dynamic analysis tools like gdb for reverse engineering and debugging. With the right techniques and tools, even a stripped binary can reveal its secrets.