Reverse OpenSSL libcrypto HMAC by dynamic analysis

You are given an executable a.out that takes a single command line argument, a string, and then computes a 32 byte HMAC on the input and prints it to stdout using base64 encoding. Can you retrieve the secret key used to compute the HMAC?

For example:

$ ./a.out foo
A283gk/gcX/JA5yo6zNznSNHumIn91RxfCtyfR2rcXQ

You start to inspect the executable and see that it's a 64 bit ELF compiled for aarch64, stripped of debug symbols.

$ file a.out
main: ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-musl-aarch64.so.1, stripped

The binary dynamically links to some common, unimportant shared libraries. In particular you can see it dynamically links OpenSSL libcrypto, perhaps for computing the HMAC?

$ ldd a.out 
        linux-vdso.so.1 (0x0000f06f34466000)
        libcrypto.so.1.1 => /lib/aarch64-linux-gnu/libcrypto.so.1.1 (0x0000f06f34193000)
        libc.musl-aarch64.so.1 => not found
        libdl.so.2 => /lib/aarch64-linux-gnu/libdl.so.2 (0x0000f06f3417f000)
        libpthread.so.0 => /lib/aarch64-linux-gnu/libpthread.so.0 (0x0000f06f3414e000)
        libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000f06f33fdb000)
        /lib/ld-musl-aarch64.so.1 => /lib/ld-linux-aarch64.so.1 (0x0000f06f34436000)

If the crypto stack is statically linked (which it usually is in well protected executables) rather than dynamically linked to the system's, you cannot really make educated guesses like that. Instead our first step would be to recognize which cryptography stack is being used - a rather significant endeavour in itself.

Nonetheless I crossed my fingers and hoped that it was dynamically linking OpenSSL libcrypto for the HMAC computation. After some reconnaissance static analysis in Ghidra, I realized that indeed we were interested in the libcrypto HMAC function with signature HMAC(hash_function, &key, 10, src, src_len, out, &out_len) - whose symbol name didn't appear to have been stripped.

When an executable is stripped of debug symbols, the resulting binary may not contain variable names, function names, line numbers, data types, etc. In this case the executable was stripped of some debug symbols but apparently not the HMAC function name. Usually an executable will be completely stripped of all debug symbols, in which case we would additionally need to recognize the HMAC function and make note of its obfuscated name.

The function takes a secret key by pointer (together with its integer length), as well as the source and output and their lengths. The key is naturally supplied before the function is called, but it is highly obfuscated (maybe some LLVM-based obfuscation) because reversing the key is the entire point of this challenge. Unfortunately I was short on time and I didn't think I could reverse that scary-looking routine by static analysis.

However I could see the light at the end of the tunnel with a dynamic analysis approach, since the crypto stack was so conveniently dynamically linked and pretty much unstripped. I rushed to load the binary in gdb, set a breakpoint at this HMAC function, and ran the program supplying "foo" as a command line argument.

$ gdb ./a.out
(gdb) b HMAC
(gdb) run foo

Unfortunately gdb could not provide information about the arguments of the HMAC function because the symbol table is missing.

(gdb) info args
No symbol table info available.

However, one can always try to meaningfully print the contents of the CPU registers at a function breakpoint, in this case keeping in mind the 64-bit ARM (AArch64) calling convention. In gdb run:

(gdb) info registers
x0             0xffffa69fd738      281473477236536
x1             0xffffd886e430      281474314462256
x2             0xa                 10
x3             0xffffd886ef5d      281474314465117
x4             0x3                 3
x5             0xffffd886e4f8      281474314462456
x6             0xffffd886e42c      281474314462252
x7             0x5                 5
x8             0x20                32
x9             0xf5                245
...

Recall the 64-bit ARM calling convention:

The 64-bit ARM (AArch64) calling convention allocates the 31 general-purpose registers as:

Basically x0 will hold the first argument to a function, x1 the second argument, and so on and so forth. When a function has more arguments than the number of general-purpose registers, the remaining arguments are typically passed on the stack. If we focus on gdb again everything should be clear now:

Nice! Everything is consistent. We could stop here and go home. But for completeness let's write an equivalent program (using the same HMAC routine) that supplies the same key we reversed and double check that we get the same result. I just guessed by trial and error that the underlying hash function used in the HMAC routine was SHA256.

#include <openssl/hmac.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <openssl/bio.h>
#include <openssl/evp.h>

int main(int argc, char *argv[]) {
        char key[] = "1234567890";
        unsigned char result[32];
        unsigned int len;
        HMAC(EVP_sha256(), key, strlen(key), (unsigned char*) argv[1], strlen(argv[1]), result, &len);

        // Print as base64 with no line breaks
        BIO *b64 = BIO_new(BIO_f_base64());
        BIO_set_flags(b64, BIO_FLAGS_BASE64_NO_NL);
        BIO *bio = BIO_new_fp(stdout, BIO_NOCLOSE);

        // Print
        BIO_push(b64, bio);
        BIO_write(b64, result, len);
        BIO_flush(b64);
        BIO_free_all(bio);

        return 0;
}

Sure enough we get the same result as the mysterious binary:

$ sudo apt install -y build-essential libssl-dev
$ gcc hmac.c -lcrypto
$ ./a.out foo
A283gk/gcX/JA5yo6zNznSNHumIn91RxfCtyfR2rcXQ=

In conclusion, by taking a dynamic analysis approach and using gdb to inspect the contents of the CPU registers at the HMAC function breakpoint, we were able to retrieve the key. This was achieved by examining the general-purpose registers allocated by the 64-bit ARM calling convention, which helped us identify the registers containing the pointer to the key and its length.

This exercise highlights the importance of understanding the underlying CPU architecture and function calling convention, as well as the usefulness of dynamic analysis tools like gdb for reverse engineering and debugging. With the right techniques and tools, even a stripped binary can reveal its secrets.