Lets try something totally crazy. Lets try dynamic unpacking without total unpacking control, without breakpoints, without any kind of debugging whatsoever. Lets merge our unpacking process with the packer itself, binding them into one unique work-flow that collects information while the packer is executing. It's similar to what we do with debugging - just without the debugger. How do we do this? Can we for that matter?
We can, with a little help from TitanEngine's hooking library. The idea is to use our unpacker as a library which will be injected into the packed file during its execution. Such a library would place hooks inside the packer code, redirecting the control flow to our unpacker wherever data collection or execution handling is needed. Those places are usually spots where the packer processes the import table or relocations, jumps to the original entry point, or just switches execution from one layer to another.
What are the benefits of such an approach? Even though its slightly harder to create and test such unpackers, the most notable benefit of unpacking by hooking is total immunity to various anti-debugging tricks used to detect the unpacking process. The only detection applicable to this unpacking scenario is anti-hooking and memory checksumming. The first is hardly ever used in modern protections due to the large number of false positives it gives, which are triggered by the operating system itself, security software and various window skinning applications. The second one is rarely present, and when it is it only covers specific memory regions that correspond to a single protection layer. In conclusion this method of implementing the unpacking process should result in fewer things to worry about.
Implementing this kind of hooking requires building custom functions to process the hook events. This is necessary to maintain the packed program work flow, and is exactly why we preserve the register state with PUSHAD, and if there is a jump affected by our hook, even EFLAGS with PUSHFD. These ASM instructions are embedded in our C code and with the help of naked pre-processor instruction they become the prologue and epilogue of the function. To apply the hooks we use the DLL_PROCESS_ATTACH event. For example if we were to hook the UPX code which loads libraries the hook code flow would look like this:
Since our hooks are 5 bytes we need to "borrow" as many instructions as we need to insert the hook. In this case we are "borrowing" three instructions. These instructions will be executed right after our inserted function is called. This is done to preserve the packer work flow. As you can see from this diagram we are using hooks instead of breakpoints. Therefore these hooks will be placed on at least three places: when UPX calls LoadLibraryA, GetProcAddress and finally once it jumps to the entry point. The most basic sample UPX unpacker is limited to working on executables that don't import functions by ordinals and use the old jump to entry point method. It's quite limited, but it's enough for a proof-of-concept of our technique.
Debugging this kind of unpacker can be rather tricky. This video shows a quick and easy way to do it:
Since we are creating a hook library unpacker, we also need a loader which will execute the unpacking target and inject the unpacker library in it. This can be done in number of ways but we decided to do it via the debug - detach method. Once both the unpacker hook library and the loader are made, our unpacker is complete. We hope you got the idea on how to use this technique to build your own hooking unpackers from our short blog. Until next week...
(package contains the unpacker with source and the samples used)