Its been a really long time since we made an unpacker for... well anything. Sure we did a format converter and some archive format unpacker but our last PE unpacker was (checks the blog) in February. So, lets get back to the basics and create a dynamic unpacker for PackMan. We already have an unpacker for PackMan? Its in the TitanEngine package already, you say? Well we do, but what's stopping us from having a little fun with unpacker optimizations?
There are a lot of optimizations one can do with the TitanEngine to make it work even faster then lightning. During the related unpacker execution timing research for our upcoming CARO Workshop talk we measured the impact that certain operations inside the engine itself have on the total unpacking time. We realized that there is significant space for performance improvement in certain unpacking areas which is especially important when we are processing large file volumes. Now, when unpacking files with unpackers built around the TitanEngine you get unpacker execution times quite similar to the sample execution time, except for cases where dynamic link library unpacking requires snapshots to correct the relocation table. in those cases we see a significant unpacking execution time increase. To counter this we can either do memory snapshots to memory or optimize relocation processing and avoid using snapshots at all.
Generally when talking about fixing relocation table we refer to the easy snap-and-compare method. However there is another way of making the unpacked dynamic link library valid for loading on non default base. We can use RelocaterGrabRelocationTableEx function for cases when the packer uses non modified relocation table, defined as it is in the PECOFF document. Relocation data is still compressed and can only be accessed just before the file is relocated, which is why we need a function to inspect the memory and determine the relocation table size. And that is exactly what RelocaterGrabRelocationTableEx does. It determines the size of the relocation table at the provided address and copies it to the engine for later exporting. If we look at the following PackMan code snippet which does the image relocation:
OR ECX,ECX JE L018 MOV EDI,DWORD PTR DS:[EBX+24] JMP L013 L004: XOR EAX,EAX LODS WORD PTR DS:[ESI] OR EAX,EAX JE L011 AND AH,0F ADD EAX,DWORD PTR DS:[EBX] ADD DWORD PTR DS:[EDX+EAX],ECX L011: CMP ESI,EDI JNZ L004 L013: MOV EDX,DWORD PTR DS:[EDI] LEA ESI,DWORD PTR DS:[EDI+8] ADD EDI,DWORD PTR DS:[EDI+4] TEST EDX,EDX JNZ L011 L018: POPAD
We can see that the relocation table is stored at EBX+0x24 address. Therefore by reading that memory pointer before the actual relocation occurs we have all the parameters we need to fix the relocation table. Passing that parameter to the RelocaterGrabRelocationTableEx will result in the engine reading the relocation table and estimating its size. Therefore we can just use the pointer we read at the EBX+0x24 address and the return from RelocaterEstimatedSize to correct the PE header for the unpacked file. However RelocaterEstimatedSize doesn't return the accurate size due to the system design. It must be reduced by 8 to be correct for all cases.
Since we are only updating the PE header data we can free the relocation table stored inside the engine with RelocaterCleanup. Once we dump the process relocation table fixing is as easy as updating the PE header fields. By doing the relocation table fixing this way we optimize the speed of execution by a significant percent. No actual data needs to be written to the file on the disk since it is already there and in the correct format. Furthermore you can start the debugging without the previously necessary DLL loading on the address other then default. If you choose to use that optimization as well packer execution time will be shorter since the file might not be relocated at all thus saving CPU cycles. Until next week...
(package contains the unpacker with source and the samples used)