Introduction
At the moment, I am working on a new Hyperion version with 64 bit support. The decryption stub of Hyperion is written using the flat assembler for 32 bit platforms. Therefore, the whole code needs to be ported. And this is where the fun begins ;).
Two major aspects of the x86-64 architecture are important during the porting process:
- New 64 bit registers and corresponding op codes.
- New calling convention for functions.
The first aspect is not too hard to master. I use rax instead of eax (to switch from 32- to 64 bit width), pushad is not available anymore, etc. The second part is more interesting. Therefore, I wrote this blog entry.
Problem
The new calling convention for microsoft windows passes the first four parameters via registers. If you specify more than four parameters, every parameter >4 is pushed on stack. Using registers fastens things up, so this is basically a good idea. But it is a nightmare for readability if you are a human programmer and not a machine (e.g. a C compiler). Luckily, fasm provides us with some macros to make things easier: fastcall and proc.
Fastcall uses the new x64 calling convention. Stdcall, etc. still exist, but they redirect to fastcall. Invoke is available too for indirect addressing which is handy if you call APIs from the import table. The address directive can prepend a parameter and allows fasm to load the corresponding value via lea instead of mov. So calling functions is pretty straight forward and not problematic at all. Take a look at fasm documentation if you are interested in more details.
Proc and endp perform the function declaration. The macro automatically generates a stack frame, saves non-volatile registers marked with the uses directive, etc. Based on this, I started with my first example:
start:
sub rsp,8
fastcall myFunction1, address myText, address myTitle, 1, 2, 3
invoke ExitProcess,0
proc myFunction1 uses rbx, param1, param2, param3, param4, param5
invoke MessageBoxA, 0, [param1], [param2], 0
ret
endp
Unfortunatly, this does not work as intended. My idea was: If the fastcall macro automatically uses registers to pass its parameters, they can be accessed in the function body referencing the corresponding labels (param1 and param2). This is a mistake. Param1 and param2 do not redirect to rcx and rdx. Instead, they access the stack frame at ebp+10h and ebp+18h. The reason for this behaviour lies within the Microsoft calling convention. The first four parameters are passed via registers. Nevertheless, there has to be a shadow space on stack with a size of 20h which can hold the first four parameters. This memory is uninitialized if you do not touch it. Therefore, the example above could not work. So I fixed my code and made things even more worse:
proc myFunction1 uses rbx, param1, param2, param3, param4, param5
invoke MessageBoxA, 0, rcx, rdx, 0
ret
endp
This code crashes and is a good example why direct register usage combined with macros is sometimes underestimated by a human mind. If you take a closer look, it generates the following code:
mov rcx, 0
mov rdx, rcx
mov r8d, rdx
mov r9d, 0
call MessageBoxA
As a solution, the parameters rcx and rdx could be saved in some non-volatile registers before passing them to MessageBoxA. For Hyperion, this is not a suitable solution because the focus is on maintainability and not performance. Another aspect of Hyperion is its profile for AV heuristics. The code should be easy to maintain but also look like regular output from a C compiler. Therefore, I wrote the following example:
void main(){
testFunc("Fancy Title", "Helo Moto", 1, 2, 3);
}
int testFunc(const char* title, const char* msg, int p3, int p4, int p5){
if(p3 == 1){
MessageBox(0, "param3==1", "Time for a message box", 0);
}
MessageBox(0, title, msg, 0);
}
I compiled it without debug symbols but -O0 using MinGW64. The idea behind this code is to see how the gcc handles the volatile registers of testFunc(). The disassembly can be seen here:
The gcc with disabled optimzations stores the parameters in the previously mentioned shadow space. They are fetched when needed in the if block and the two MessageBox() calls.
Conclusion
If you are writing fasm code focussing on maintainability and not performance, it is an option to use the shadow space like a 32 bit stdcall. This technique is used by C compilers too when optimizations are disabled and therefore should not flag AV heuristics.