dustland

dustball in dustland

调用约定

调用约定

为啥CSAPP第三章x86-64汇编学完了,但是看IDA的反汇编仍然是一头雾水?还得看一大堆东西,其中就有调用约定

为什么windows上和linux上,x86和x64上编译出来的代码有很多不同,为什么和CSAPP说的相差甚远?调用约定不同是一大原因

首先要说明的几点,也是实验中和查阅资料逐渐获得的几点

1.==各种调用约定是相对于x86而言的==,对x64无意义

The keywords _stdcall and _cdecl specify 32-bit calling conventions. That's why they are not relevant for 64-bit programs (i.e. x64). On x64, there is only the standard calling convention and the extended __vectorcall calling convenction.

来自stackoverflow

关键词_stdcall_cdecl特指32位的调用约定.64位上不一样,64位上只有标准调用约定,还有其拓展__vectorcall

即使在64位的函数前面用__cdecl或者__stdcall修饰,编译结果也是一样的

2.x86和x64汇编有较大出入,windows上和linux上的同一约定也有些许区别

x86上的调用约定

微软给出的==x86系统==上的调用约定:

image-20220427184436835

一定注意是x86系统上的,而我们现在的笔记本大多数都是x64系统了,会有一些出入

c调用约定__cdecl

C Declaration

1
<return_type> __cdecl <func_name>(para1,para2,...,paran);

对于x86系统,微软官方文档是这样写的:

x86

维基百科这样写的:

image-20220428152115878

在gcc编译的时候加入-m32选项即可使用32位编译,编译成x86系统的程序

test.c

1
2
3
4
5
6
7
8
9
int _cdecl func(int a,int b,int c,int d,int e,int f,int g,int h){
return a+b+c+d+e+f+g+h;
}
int _cdecl show(){
return func(1,2,3,4,5,6,7,8);
}
int _cdecl main(){
show();
}
1
gcc -O0 test.c -c -m32 -o test.o|objdump -d test.o > test.s|code test.s

使用-m32编译之后然后反汇编

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
test.o:     file format pe-i386


Disassembly of section .text:

00000000 <_func>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp ;蜜汁操作,有esp为啥还要获取一个ebp作为拷贝?
3: 8b 55 08 mov 0x8(%ebp),%edx
6: 8b 45 0c mov 0xc(%ebp),%eax
9: 01 c2 add %eax,%edx
b: 8b 45 10 mov 0x10(%ebp),%eax
e: 01 c2 add %eax,%edx
10: 8b 45 14 mov 0x14(%ebp),%eax
13: 01 c2 add %eax,%edx
15: 8b 45 18 mov 0x18(%ebp),%eax
18: 01 c2 add %eax,%edx
1a: 8b 45 1c mov 0x1c(%ebp),%eax
1d: 01 c2 add %eax,%edx
1f: 8b 45 20 mov 0x20(%ebp),%eax
22: 01 c2 add %eax,%edx
24: 8b 45 24 mov 0x24(%ebp),%eax
27: 01 d0 add %edx,%eax
29: 5d pop %ebp
2a: c3 ret

0000002b <_show>:
2b: 55 push %ebp
2c: 89 e5 mov %esp,%ebp
2e: 83 ec 20 sub $0x20,%esp ;申请0x20=32字节空间,刚好8个int参数×一个int是4个字节,但是蜜汁操作,为啥不用push逐次压栈,而是一次性申请空间
31: c7 44 24 1c 08 00 00 movl $0x8,0x1c(%esp)
38: 00
39: c7 44 24 18 07 00 00 movl $0x7,0x18(%esp) ;每个参数占用栈上4个字节,8个参数紧挨着
40: 00
41: c7 44 24 14 06 00 00 movl $0x6,0x14(%esp)
48: 00
49: c7 44 24 10 05 00 00 movl $0x5,0x10(%esp)
50: 00
51: c7 44 24 0c 04 00 00 movl $0x4,0xc(%esp)
58: 00
59: c7 44 24 08 03 00 00 movl $0x3,0x8(%esp)
60: 00
61: c7 44 24 04 02 00 00 movl $0x2,0x4(%esp)
68: 00
69: c7 04 24 01 00 00 00 movl $0x1,(%esp) ;栈顶一定存放的是最左侧的参数
70: e8 8b ff ff ff call 0 <_func>
75: c9 leave ;蜜汁指令,CSAPP上没有见过leave指令
76: c3 ret

00000077 <_main>:
77: 55 push %ebp
78: 89 e5 mov %esp,%ebp
7a: 83 e4 f0 and $0xfffffff0,%esp
7d: e8 00 00 00 00 call 82 <_main+0xb>
82: e8 a4 ff ff ff call 2b <_show>
87: b8 00 00 00 00 mov $0x0,%eax ;返回值放在eax,rax寄存器中
8c: c9 leave
8d: c3 ret
8e: 90 nop
8f: 90 nop

32位系统必定不会用到r开头的4字64位寄存器比如rax,rdx,rsp等等,最大用到e开头的寄存器,比如eax,esp

可以发现show函数在调用func函数,传参的时候没有用到一个寄存器,全都是用的堆栈,还可以发现函数名都是由下划线前缀的<_main>,<_func>,<_show>

在为函数参数申请栈空间的时候是一次性完成的,即有8个参数则直接在栈上申请0x20=32字节,然后分别用movl指令向栈上刚才申请的空间写入数据.

关于蜜汁操作参数的压栈方式,是一次性申请足够的空间然后mov还是逐次push?

stackoverflow上的说法:

  1. Why does x64 use mov rather than push? I assume it's just more efficient and wasn't available in x86.

That is not the reason. Both of these instructions also exist in x86 assembly language.

效率并且是否可实现不是原因.这两种指令(push和mov)在x86汇编语言中都存在

The reason why your compiler is not emitting a push instruction for the x64 code is probably because it must adjust the stack pointer directly anyway, in order to create 32 bytes of "shadow space" for the called function. See this link (which was provided by @NateEldredge) for further information on "shadow space".

编译器对x64不使用push指令的原因是:他需要直接调整栈顶指针,给前四个参数的压栈预留"影子空间"

x86不需要寄存器传递参数但是x64需要寄存器并且在被调用函数的一开始会把寄存器中的参数也压栈,那么这些寄存器中的参数将会压入影子空间.具体见后文的实验

关于蜜汁操作ebp(rbp)寄存器的作用:

行为:在每个函数开始时都会被压入栈中然后拷贝栈顶指针,在有些函数快要结束的时候又会从栈中获取先前压入栈中的值

比如一个典型的结构:

1
2
3
2b:	55                   	push   %ebp
2c: 89 e5 mov %esp,%ebp
2e: 83 ec 20 sub $0x20,%esp

查阅stackoverflow

image-20220427195841367

rbp is the frame pointer on x86_64. In your generated code, it gets a snapshot of the stack pointer (rsp) so that when adjustments are made to rsp (i.e. reserving space for local variables or pushing values on to the stack), local variables and function parameters are still accessible from a constant offset from rbp.

A lot of compilers offer frame pointer omission as an optimization option; this will make the generated assembly code access variables relative to rsp instead and free up rbp as another general purpose register for use in functions.

In the case of GCC, which I'm guessing you're using from the AT&T assembler syntax, that switch is -fomit-frame-pointer. Try compiling your code with that switch and see what assembly code you get. You will probably notice that when accessing values relative to rsp instead of rbp, the offset from the pointer varies throughout the function.

rbp是x86_64上的栈帧指针.在我们的代码中,rbp寄存器获取栈顶指针rsp的快照.

当rsp改变时(比如为局部变量预留空间或者通过push指令压栈),我们仍然可以通过使用rbp+偏移量这种方式调用上一个函数(或者说调用者)的局部变量或者函数参数.

很多编译器的优化,会不用上述方式(rbp+偏移量)调用上一个函数的局部变量或者函数参数,而是只用rsp+偏移量.然后省出rbp寄存器去干其他事.对于GCC编译器,使用-fomit-frame-pointer编译选项达到上述目的

按照我的理解,rbp的作用就是调用者的rsp副本,然后rsp为被调用者服务,rbp为调用者服务.

rbp只是在被调用者嗲用调用者的局部变量时,令寻址更方便,完全可以只用rsp达到目的

后来的实践证明我一开始的理解是错误的

rbp指向函数栈帧的高地址,即栈底,rsp指向函数栈帧的低地址,即栈顶

二者都是为当前函数服务的

函数的开端时会将上一个函数的rbp指针压栈保存,然后指向当前函数栈帧的栈底.函数尾声时会将上一个函数的rbp指针退栈还给rbp

1
gcc -O0 -fomit-frame-pointer test.c -c -m64 -o test.o|objdump -d test.o > test.s|code test.s
1
2
3
4
5
6
7
8
9
10
00000000000000d6 <main>:
d6: 48 83 ec 28 sub $0x28,%rsp
da: e8 00 00 00 00 callq df <main+0x9>
df: e8 ae ff ff ff callq 92 <show>
e4: b8 00 00 00 00 mov $0x0,%eax
e9: 48 83 c4 28 add $0x28,%rsp
ed: c3 retq
ee: 90 nop
ef: 90 nop

使用-fomit-frame-pointer编译选项之后确实ebp不踪影了

现在再看这个结构:

1
2
3
2b:	55                   	push   %ebp				;将上一个函数对上上个函数的ebp保存
2c: 89 e5 mov %esp,%ebp ;ebp获取上一个函数esp的副本
2e: 83 ec 20 sub $0x20,%esp ;esp为当前函数服务

最后将栈中刚才压入的ebp又还给ebp是还原上个函数对上上个函数的esp副本

关于蜜汁指令leave:

百度百科给出的解释:

image-20220427224030429

一定要注意,这里指令的源和目的操作数与我们通篇是相反的

这里百科给出的解释使用的是intel风格的汇编语言,mov 目的操作数,源操作数

寄存器前面有百分号的是AT&T风格的汇编语言,movq 源操作数,目的操作数

leave指令在AT&T风格下相当于:

1
2
movl %ebp,%esp
pop %ebp

而这刚好和每个函数一开始的

1
2
push   %ebp
mov %esp,%ebp

恰好相反

因此leave指令就是还原栈的一个过程

标准调用约定__stdcall

微软官方文档给出的解释:

The __stdcall calling convention is used to call Win32 API functions. The callee cleans the stack, so the compiler makes vararg functions __cdecl. Functions that use this calling convention require a function prototype. The __stdcall modifier is Microsoft-specific.

__stdcall用于修饰==Win32 API函数==.被调用者负责情理自己的函数栈,(因此编译器会把变参函数修饰为__cdecl(调用者清理栈容易实现变参)).使用__stdcall的函数需要一个函数原型(即接口)

1
return-type __stdcall function-name[( argument-list )]
Element Implementation
Argument-passing order
参数传递顺序
Right to left.
从右向左
Argument-passing convention
参数传递规则(值传递/引用传递)
By value, unless a pointer or reference type is passed.
除非参数是指针或者引用类型,否则采用值传递
Stack-maintenance responsibility
栈维护
Called function pops its own arguments from the stack.
被调用者自己清理自己用到的栈
Name-decoration convention
命名修饰规则
An underscore (_) is prefixed to the name. The name is followed by the at sign (@) followed by the number of bytes (in decimal) in the argument list. Therefore, the function declared as int func( int a, double b ) is decorated as follows: _func@12
下划线开头,然后@,然后是十进制表示的参数表字节大小.
因此int func(int a,double b)将会被修饰为_func@12(int四个字节+double八个字节)
Case-translation convention
大小写转换规定
None
返回值位置 放在eax,rax寄存器中

用ida打开一个win32程序,其Winmain函数是这样分析的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
.text:00401000 ; int __stdcall WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nShowCmd)
.text:00401000 __stdcall WinMain(x, x, x, x) proc near ; CODE XREF: start+C9↓p
.text:00401000
.text:00401000 hInstance = dword ptr 4
.text:00401000 hPrevInstance = dword ptr 8
.text:00401000 lpCmdLine = dword ptr 0Ch
.text:00401000 nShowCmd = dword ptr 10h
.text:00401000
.text:00401000 mov eax, [esp+hInstance]
.text:00401004 push 0 ; dwInitParam
.text:00401006 push offset DialogFunc ; lpDialogFunc
.text:0040100B push 0 ; hWndParent
.text:0040100D push 65h ; 'e' ; lpTemplateName
.text:0040100F push eax ; hInstance
.text:00401010 mov hInstance, eax
.text:00401015 call ds:DialogBoxParamA
.text:0040101B xor eax, eax
.text:0040101D retn 10h ;retn指令可以带参数
.text:0040101D __stdcall WinMain(x, x, x, x) endp

可以明显观察到,参数只使用栈传递,从右向左压栈,Winmain函数的栈帧:

image-20220428092006441

有一点与__cdecl不同的是retn 10h,并且貌似与官方文档不同的是,被调用者没有自己清理自己的堆栈,比如Winmain到结束了也没有看见退栈指令.

实际上这就是retn 10h要做的事情

10h=16字节然而四个参数刚好每个4字节,即retn XXh就是被调用者的退栈指令,和返回指令合并成一条指令了

如此减少了清理堆栈需要使用的指令

还是test.c

1
2
3
4
5
6
7
8
9
10
11
int _stdcall func(short a,short b,short c,short d,short e,short f,short g,short h){

return a+b+c+d+e+f+g+h;

}
int _stdcall show(){
return func(1,2,3,4,5,6,7,8);
}
int _stdcall main(){
show();
}

使用gcc,objdump,vscode素质三连

1
2
3
PS C:\Users\86135\Desktop\reverse\test_call> gcc test.c -O0 -m32 -c -o test.o
PS C:\Users\86135\Desktop\reverse\test_call> objdump test.o -d >test.s
PS C:\Users\86135\Desktop\reverse\test_call> code test.s

反汇编如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95

test.o: file format pe-i386


Disassembly of section .text:

00000000 <_func@32>: ;函数名<_func@32>下划线,@,参数表大小(单位:字节)
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 57 push %edi ;寄存器临时压栈保存,为后来的运算做准备,最后还要弹栈复原
4: 56 push %esi
5: 53 push %ebx
6: 83 ec 28 sub $0x28,%esp
9: 8b 45 08 mov 0x8(%ebp),%eax
c: 8b 4d 0c mov 0xc(%ebp),%ecx
f: 8b 5d 10 mov 0x10(%ebp),%ebx
12: 89 5d d0 mov %ebx,-0x30(%ebp)
15: 8b 75 14 mov 0x14(%ebp),%esi
18: 89 75 cc mov %esi,-0x34(%ebp)
1b: 8b 7d 18 mov 0x18(%ebp),%edi
1e: 8b 75 1c mov 0x1c(%ebp),%esi
21: 8b 5d 20 mov 0x20(%ebp),%ebx
24: 8b 55 24 mov 0x24(%ebp),%edx
27: 66 89 45 f0 mov %ax,-0x10(%ebp)
2b: 89 c8 mov %ecx,%eax
2d: 66 89 45 ec mov %ax,-0x14(%ebp)
31: 0f b7 45 d0 movzwl -0x30(%ebp),%eax
35: 66 89 45 e8 mov %ax,-0x18(%ebp)
39: 0f b7 45 cc movzwl -0x34(%ebp),%eax
3d: 66 89 45 e4 mov %ax,-0x1c(%ebp)
41: 89 f8 mov %edi,%eax
43: 66 89 45 e0 mov %ax,-0x20(%ebp)
47: 89 f0 mov %esi,%eax
49: 66 89 45 dc mov %ax,-0x24(%ebp)
4d: 89 d8 mov %ebx,%eax
4f: 66 89 45 d8 mov %ax,-0x28(%ebp)
53: 89 d0 mov %edx,%eax
55: 66 89 45 d4 mov %ax,-0x2c(%ebp)
59: 0f bf 55 f0 movswl -0x10(%ebp),%edx
5d: 0f bf 45 ec movswl -0x14(%ebp),%eax
61: 01 c2 add %eax,%edx
63: 0f bf 45 e8 movswl -0x18(%ebp),%eax
67: 01 c2 add %eax,%edx
69: 0f bf 45 e4 movswl -0x1c(%ebp),%eax
6d: 01 c2 add %eax,%edx
6f: 0f bf 45 e0 movswl -0x20(%ebp),%eax
73: 01 c2 add %eax,%edx
75: 0f bf 45 dc movswl -0x24(%ebp),%eax
79: 01 c2 add %eax,%edx
7b: 0f bf 45 d8 movswl -0x28(%ebp),%eax
7f: 01 c2 add %eax,%edx
81: 0f bf 45 d4 movswl -0x2c(%ebp),%eax
85: 01 d0 add %edx,%eax
87: 83 c4 28 add $0x28,%esp
8a: 5b pop %ebx ;对应函数开始时将寄存器压栈保存,现在退栈复原
8b: 5e pop %esi
8c: 5f pop %edi
8d: 5d pop %ebp
8e: c2 20 00 ret $0x20 ;被调用者自行清理自己的栈

00000091 <_show@0>:
91: 55 push %ebp
92: 89 e5 mov %esp,%ebp
94: 83 ec 20 sub $0x20,%esp ;一次性分配0x20=32字节空间然后使用mov指令将参数压栈
97: c7 44 24 1c 08 00 00 movl $0x8,0x1c(%esp)
9e: 00
9f: c7 44 24 18 07 00 00 movl $0x7,0x18(%esp)
a6: 00
a7: c7 44 24 14 06 00 00 movl $0x6,0x14(%esp)
ae: 00
af: c7 44 24 10 05 00 00 movl $0x5,0x10(%esp)
b6: 00
b7: c7 44 24 0c 04 00 00 movl $0x4,0xc(%esp)
be: 00
bf: c7 44 24 08 03 00 00 movl $0x3,0x8(%esp)
c6: 00
c7: c7 44 24 04 02 00 00 movl $0x2,0x4(%esp)
ce: 00
cf: c7 04 24 01 00 00 00 movl $0x1,(%esp)
d6: e8 25 ff ff ff call 0 <_func@32>
db: 83 ec 20 sub $0x20,%esp
de: c9 leave
df: c3 ret

000000e0 <_main@0>:
e0: 55 push %ebp
e1: 89 e5 mov %esp,%ebp
e3: 83 e4 f0 and $0xfffffff0,%esp
e6: e8 00 00 00 00 call eb <_main@0+0xb>
eb: e8 a1 ff ff ff call 91 <_show@0>
f0: b8 00 00 00 00 mov $0x0,%eax
f5: c9 leave
f6: c3 ret
f7: 90 nop

<>上给出的建议

image-20220428092612932

微软__fastcall

<>是这样写的:

image-20220428154840944

微软官方文档:

image-20220428155346372

同样的程序,除了main函数之外,其他函数都用_fastcall修饰

1
2
3
4
5
6
7
8
9
10
11
int _fastcall func(short a,short b,short c,short d,short e,short f,short g,short h){

return a+b+c+d+e+f+g+h;

}
int _fastcall show(){
return func(1,2,3,4,5,6,7,8);
}
int main(){//如果main也用_fastcall修饰则报错没有入口点
show();
}

使用MSVC编译

1
2
3
4
5
6
7
8
9
10
C:\Users\86135\Desktop\reverse\test_call>cl test.c
用于 x86 的 Microsoft (R) C/C++ 优化编译器 19.29.30139 版
版权所有(C) Microsoft Corporation。保留所有权利。

test.c
Microsoft (R) Incremental Linker Version 14.29.30139.0
Copyright (C) Microsoft Corporation. All rights reserved.

/out:test.exe
test.obj

然后反编译

1
objdump test.obj -d >test.s
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68

test.obj: file format pe-i386


Disassembly of section .text$mn:

00000000 <@func@32>: ;函数命名规则是@函数名@参数字节数
0: 55 push %ebp
1: 8b ec mov %esp,%ebp
3: 83 ec 08 sub $0x8,%esp
6: 66 89 55 f8 mov %dx,-0x8(%ebp)
a: 66 89 4d fc mov %cx,-0x4(%ebp)
e: 0f bf 45 fc movswl -0x4(%ebp),%eax
12: 0f bf 4d f8 movswl -0x8(%ebp),%ecx
16: 03 c1 add %ecx,%eax
18: 0f bf 55 08 movswl 0x8(%ebp),%edx
1c: 03 c2 add %edx,%eax
1e: 0f bf 4d 0c movswl 0xc(%ebp),%ecx
22: 03 c1 add %ecx,%eax
24: 0f bf 55 10 movswl 0x10(%ebp),%edx
28: 03 c2 add %edx,%eax
2a: 0f bf 4d 14 movswl 0x14(%ebp),%ecx
2e: 03 c1 add %ecx,%eax
30: 0f bf 55 18 movswl 0x18(%ebp),%edx
34: 03 c2 add %edx,%eax
36: 0f bf 4d 1c movswl 0x1c(%ebp),%ecx
3a: 03 c1 add %ecx,%eax
3c: 8b e5 mov %ebp,%esp
3e: 5d pop %ebp
3f: c2 18 00 ret $0x18 ;被调用者清理自己的栈
42: cc int3
43: cc int3
44: cc int3
45: cc int3
46: cc int3
47: cc int3
48: cc int3
49: cc int3
4a: cc int3
4b: cc int3
4c: cc int3
4d: cc int3
4e: cc int3
4f: cc int3

00000050 <@show@0>:
50: 55 push %ebp
51: 8b ec mov %esp,%ebp
53: 6a 08 push $0x8
55: 6a 07 push $0x7
57: 6a 06 push $0x6
59: 6a 05 push $0x5
5b: 6a 04 push $0x4
5d: 6a 03 push $0x3
5f: ba 02 00 00 00 mov $0x2,%edx ;顶多有两个参数放在寄存器传递,其余都用栈
64: b9 01 00 00 00 mov $0x1,%ecx
69: e8 00 00 00 00 call 6e <@show@0+0x1e>
6e: 5d pop %ebp
6f: c3 ret

00000070 <_main>:
70: 55 push %ebp
71: 8b ec mov %esp,%ebp
73: e8 00 00 00 00 call 78 <_main+0x8>
78: 33 c0 xor %eax,%eax
7a: 5d pop %ebp
7b: c3 ret

微软__thiscall

微软官方文档:

The Microsoft-specific __thiscall calling convention is used on C++ class member functions on the x86 architecture. It's the default calling convention used by member functions that don't use variable arguments (vararg functions).

微软特有的__thiscall调用约定用于x86体系上C++的成员函数.定参函数默认使用该种调用约定

Under __thiscall, the callee cleans the stack, which is impossible for vararg functions. Arguments are pushed on the stack from right to left. The this pointer is passed via register ECX, and not on the stack.

如果函数有__thiscall修饰则被调用者清理自己的栈,因此变参函数难以实现.

函数参数从右向左压栈.this指针通过ECX寄存器传递

On ARM, ARM64, and x64 machines, __thiscall is accepted and ignored by the compiler. That's because they use a register-based calling convention by default.

在ARM,ARM64还有x64机器上,__thiscall会被编译器直接忽略.因为编译器默认使用一种基于寄存器的调用约定

<>

image-20220428161940071

x64上的调用约定

Microsoft x64 calling convention

微软x64调用约定

The Microsoft x64 calling convention[18][19] is followed on Windows and pre-boot UEFI (for long mode on x86-64). The first four arguments are placed onto the registers. That means RCX, RDX, R8, R9 for integer, struct or pointer arguments (in that order), and XMM0, XMM1, XMM2, XMM3 for floating point arguments. Additional arguments are pushed onto the stack (right to left). Integer return values (similar to x86) are returned in RAX if 64 bits or less. Floating point return values are returned in XMM0. Parameters less than 64 bits long are not zero extended; the high bits are not zeroed.

微软x64调用约定适用于Windows和UEFI.

前四个参数,如果是整数或者结构体或者指针类型,则放在寄存器RCX,RDX,R8,R9寄存器里,如果是浮点数则放在XMM0到XMM3里

额为的参数放在栈里(从右向左压栈)

返回值如果小于等于64位则放在RAX寄存器里(类似于x86的情形)

浮点返回值放在XMM0里

小于64位的参数进行有符号拓展

Structs and unions with sizes that match integers are passed and returned as if they were integers. Otherwise they are replaced with a pointer when used as an argument. When an oversized struct return is needed, another pointer to a caller-provided space is prepended as the first argument, shifting all other arguments to the right by one place.[20]

结构体和联合体如果大小与整形匹配则被当作整形进行参数传递还有返回.否则,当他们作为参数时,会被一个指针替代

当需要一个超大的结构体需要返回时,指向调用方提供的空间的另一个指针将作为第一个参数,将所有其他参数向右移动一个位置

When compiling for the x64 architecture in a Windows context (whether using Microsoft or non-Microsoft tools), stdcall, thiscall, cdecl, and fastcall all resolve to using this convention.

不管使用的编译器是不是微软的工具,对于x64体系,stdcall,thiscall,cdecl,fastcall都会被忽略,然后使用上述方法处理

In the Microsoft x64 calling convention, it is the caller's responsibility to allocate 32 bytes of "shadow space" on the stack right before calling the function (regardless of the actual number of parameters used), and to pop the stack after the call. The shadow space is used to spill RCX, RDX, R8, and R9,[21] but must be made available to all functions, even those with fewer than four parameters.

在微软x64调用约定中,调用者在调用其他函数之前,有义务在栈上分配32字节的"影子空间",并且忽略实际上参数占用的大小,并且在调用结束后由调用者清理被调用者的堆栈.

影子空间的作用是用于将来存放RCX,RDX,R8,R9中的前四个参数,但是即使是没有不够四个参数的函数,也会预留一个32字节的影子空间

The registers RAX, RCX, RDX, R8, R9, R10, R11 are considered volatile (caller-saved).[22]

RAX, RCX, RDX, R8, R9, R10, R11这些寄存器都是volatile修饰的

image-20220428101247431

The registers RBX, RBP, RDI, RSI, RSP, R12, R13, R14, and R15 are considered nonvolatile (callee-saved).[22]

RBX, RBP, RDI, RSI, RSP, R12, R13, R14, and R15不用volatile修饰

For example, a function taking 5 integer arguments will take the first to fourth in registers, and the fifth will be pushed on top of the shadow space. So when the called function is entered, the stack will be composed of (in ascending order) the return address, followed by the shadow space (32 bytes) followed by the fifth parameter.

举个例子,一个有5参数的 函数,其前四个参数将会被放在寄存器里然后第五个参数竟会别压入栈顶,并且在影子空间之上.

因此当进入被调用函数时,栈中的组成按照从栈顶到栈底将是:返回值,影子空间,第五个参数

这里影子空间就是给前四个参数腾空,前四个参数使用寄存器传递之后在被调用者中会被重新压栈,即压入这个预留的影子空间

维基百科这样写的:

image-20220428152348161

x86 x64调用约定及传参顺序 - 一瓶怡宝 - 博客园 (cnblogs.com)

同样的程序test.c

1
2
3
4
5
6
7
8
9
int  func(int a,int b,int c,int d,int e,int f,int g,int h){
return a+b+c+d+e+f+g+h;
}
int show(){
return func(1,2,3,4,5,6,7,8);
}
int main(){
show();
}

使用如下命令gcc -O0 test.c -c -o test.o|objdump -d test.o > t.s|code t.s

首先不用编译优化,将test.c编译成目标文件test.o,

然后使用objdump反编译得到反汇编代码t.s

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
test.o:     file format pe-x86-64


Disassembly of section .text:

0000000000000000 <func>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 89 4d 10 mov %ecx,0x10(%rbp) ;蜜汁操作,将ecx中存放的参数也压入栈中
7: 89 55 18 mov %edx,0x18(%rbp)
a: 44 89 45 20 mov %r8d,0x20(%rbp)
e: 44 89 4d 28 mov %r9d,0x28(%rbp)
12: 8b 55 10 mov 0x10(%rbp),%edx
15: 8b 45 18 mov 0x18(%rbp),%eax
18: 01 c2 add %eax,%edx
1a: 8b 45 20 mov 0x20(%rbp),%eax
1d: 01 c2 add %eax,%edx
1f: 8b 45 28 mov 0x28(%rbp),%eax
22: 01 c2 add %eax,%edx
24: 8b 45 30 mov 0x30(%rbp),%eax
27: 01 c2 add %eax,%edx
29: 8b 45 38 mov 0x38(%rbp),%eax
2c: 01 c2 add %eax,%edx
2e: 8b 45 40 mov 0x40(%rbp),%eax
31: 01 c2 add %eax,%edx
33: 8b 45 48 mov 0x48(%rbp),%eax
36: 01 d0 add %edx,%eax
38: 5d pop %rbp
39: c3 retq

000000000000003a <show>:
3a: 55 push %rbp
3b: 48 89 e5 mov %rsp,%rbp
3e: 48 83 ec 40 sub $0x40,%rsp ;为子函数申请栈空间,但是蜜汁操作,8个int参数,一个int占4字节,理论上需要0x20=32字节空间,却申请了0x40=64字节的空间
42: c7 44 24 38 08 00 00 movl $0x8,0x38(%rsp) ;将立即数8放在栈中rsp+0x38位置
49: 00
4a: c7 44 24 30 07 00 00 movl $0x7,0x30(%rsp) ;将7放在栈中rsp+0x30位置
51: 00
52: c7 44 24 28 06 00 00 movl $0x6,0x28(%rsp) ;0x30-0x28=48-40=8,蜜汁操作,相邻两个参数在栈上距离8字节
59: 00
5a: c7 44 24 20 05 00 00 movl $0x5,0x20(%rsp)
61: 00
62: 41 b9 04 00 00 00 mov $0x4,%r9d ;立即数4放在r9d寄存器中
68: 41 b8 03 00 00 00 mov $0x3,%r8d
6e: ba 02 00 00 00 mov $0x2,%edx
73: b9 01 00 00 00 mov $0x1,%ecx ;立即数1放在ecx寄存器中
78: e8 83 ff ff ff callq 0 <func>
7d: 48 83 c4 40 add $0x40,%rsp
81: 5d pop %rbp
82: c3 retq

0000000000000083 <main>:
83: 55 push %rbp
84: 48 89 e5 mov %rsp,%rbp
87: 48 83 ec 20 sub $0x20,%rsp
8b: e8 00 00 00 00 callq 90 <main+0xd> ;蜜汁操作,90行就在下面,为啥要call一下
90: e8 a5 ff ff ff callq 3a <show>
95: b8 00 00 00 00 mov $0x0,%eax
9a: 48 83 c4 20 add $0x20,%rsp
9e: 5d pop %rbp
9f: c3 retq

1.函数名没有下划线前缀

2.show和main函数都有固定的格式:

1
2
3
4
5
6
7
8
push %rbp			;rbp是被调用者保存的寄存器,当前函数可以使用,但是最后结束的时候要还原rbp的状态,因此压栈存储先前状态
mov %rsp,%rbp ;将先前的栈顶指针存放在刚刚腾出空闲的rbp寄存器中
sub %0x..,%rsp ;栈顶指针下降,在栈上为将要调用的子函数申请栈空间
callq <..> ;调用函数
..;处理返回值 ;通常返回值在eax寄存器中,进行一些处理
add %0x..,%rsp ;子函数已经执行结束了,为其申请的栈帧不需要再存在了,复原栈顶指针位置
pop %rbp ;将被调用者有义务保存的寄存器rbp还原
retq ;本函数返回

3.关于show函数在调用具有8个参数的func函数时,参数如何安排

关于蜜汁操作参数安排

1.后面第5到8个参数使用栈传递,5位于0x20+rsp,8位于0x38+rsp,即约靠左的参数越靠近栈顶rsp

2.前面1到4个参数==使用寄存器传递==

3.在进入被调用者函数后,将刚才调用者通过寄存器传递的参数也放进栈里,

并且x64上调用者在为子函数申请栈空间的时候也会有意申请很大,为待会儿寄存器中的参数也压栈做准备

实际上这三条都完成之后和x86上的结果是相同的,

1
2
func(p1 ,p2 ,p3, p4 ,p5           ,...,plast	   );
func(ecx,edx,r8d,r9d,远离栈顶的地方,...,靠近栈顶的地方);

关于蜜汁操作四字节的int在栈上分配8字节空间:

在64位不管是windows还是linux系统上int都是4字节的,long long都是8字节的

上面这段程序中各个参数改成short,int,long,long long类型之后反编译得到的汇编语言,在为子函数申请栈空间的时候都是0x40=64个字节

即参数不管什么类型都是以8字节传递的,这一点可以从使用r9d寄存器传递int参数看出

1
2
62:	41 b9 04 00 00 00    	mov    $0x4,%r9d			;立即数4放在r9d寄存器中
68: 41 b8 03 00 00 00 mov $0x3,%r8d

r开头的寄存器都是4字寄存器,理论上是放long long 的,但是这里int也用了r9d传递

关于蜜汁操作就在下一行的指令还要call

案发现场:

1
2
3
4
5
...
8b: e8 00 00 00 00 callq 90 <main+0xd> ;蜜汁操作,90行就在下面,为啥要call一下
90: e8 a5 ff ff ff callq 3a <show>
95: b8 00 00 00 00 mov $0x0,%eax
...

写一个更短的程序观察这个事

test.c

1
2
3
4
void  foo(){}
int main(){
foo();
}
1
2
gcc test.c -O0 -c -o test.o|objdump -d test.o > test.s|code test.s
不开任何编译优化,反汇编

反编译得到

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
0000000000000000 <foo>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 90 nop
5: 5d pop %rbp
6: c3 retq

0000000000000007 <main>:
7: 55 push %rbp
8: 48 89 e5 mov %rsp,%rbp
b: 48 83 ec 20 sub $0x20,%rsp
f: e8 00 00 00 00 callq 14 <main+0xd>
14: e8 e7 ff ff ff callq 0 <foo>
19: b8 00 00 00 00 mov $0x0,%eax
1e: 48 83 c4 20 add $0x20,%rsp
22: 5d pop %rbp
23: c3 retq
24: 90 nop
...

main+0xf处的callq,将下一条指令也就是main+0x14压栈,然后修改程序计数器为main+0xf,即执行jmp main+0xf

main+0x14处的callq,将下一条指令地址也就是main+0x19压栈,然后修改程序计数器为foo地址,即执行jmp foo

foo执行到最后有一个retq作用是将栈顶刚才压入的main+0x19还给程序计数器rip,然后退栈,即pop %rip

这样看起来程序已经出错了,栈顶还有一个main+0xf没有弹出,但是main+0x22处有一个退栈将位于栈顶main+0xf弹给了%rbp寄存器,然而实际上%rbp寄存器应当获取次栈顶的值,即在main+0x7压入的值

出错的原因是main+0xf处的call指令调用的不是一个函数,没有与该call指令相对应的ret指令,这导致了call前压栈但是call后不退栈.

下面正向编译观察这个事情

使用gcc -S选项正向编译成汇编语言

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
main:
pushq %rbp
.seh_pushreg %rbp
movq %rsp, %rbp
.seh_setframe %rbp, 0
subq $32, %rsp
.seh_stackalloc 32
.seh_endprologue
call __main
call show
movl $0, %eax
addq $32, %rsp
popq %rbp
ret
.seh_endproc
.ident "GCC: (tdm64-1) 9.2.0"

第9行有一个call __main

stackoverflow上的说法

Calls the _main function which will do initializing stuff that gcc needs. Call will push the current instruction pointer on the stack and jump to the address of _main

调用__main函数,初始化gcc需要的材料.该调用将当前程序计数器压栈然后跳转__main函数

显然我们gcc -c生成的目标文件.o是没有__main函数的 ,该函数应当是链接阶段加上去的

那么我们编译成exe文件之后再反编译进行观察

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
0000000000401633 <main>:
401633: 55 push %rbp
401634: 48 89 e5 mov %rsp,%rbp
401637: 48 83 ec 20 sub $0x20,%rsp
40163b: e8 c0 00 00 00 callq 401700 <__main> ;此call确实调用了__main函数
401640: e8 a5 ff ff ff callq 4015ea <show> ;此call调用了show函数
401645: b8 00 00 00 00 mov $0x0,%eax
40164a: 48 83 c4 20 add $0x20,%rsp
40164e: 5d pop %rbp
40164f: c3 retq
0000000000401700 <__main>:
401700: 8b 05 2a 59 00 00 mov 0x592a(%rip),%eax # 407030 <initialized>
401706: 85 c0 test %eax,%eax
401708: 74 06 je 401710 <__main+0x10>
40170a: c3 retq ;有ret语句
40170b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
401710: c7 05 16 59 00 00 01 movl $0x1,0x5916(%rip) # 407030 <initialized>
401717: 00 00 00
40171a: e9 71 ff ff ff jmpq 401690 <__do_global_ctors>
40171f: 90 nop

此时可以看到,两个call都是调用的函数,并且调用的函数都有ret语句与call匹配

还要补充的是关于对齐:申请栈空间时要按照16字节对齐申请

System V AMD64 ABI

image-20220428152425649

CSAPP写道,参数传递时可以用到六个寄存器,多余的参数用栈传递,是指在64位linux环境下,

而windows上只能用四个寄存器传递参数,多余的用栈传递

image-20220427223059552

还是刚才的c程序,在ubuntu上的情况

main.c

1
2
3
4
5
6
7
8
9
int _cdecl func(int a,int b,int c,int d,int e,int f,int g,int h){
return a+b+c+d+e+f+g+h;
}
int _cdecl show(){
return func(1,2,3,4,5,6,7,8);
}
int _cdecl main(){
show();
}

其反汇编代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
main.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <func>:
0: f3 0f 1e fa endbr64 ;蜜汁指令
4: 55 push %rbp
5: 48 89 e5 mov %rsp,%rbp
8: 89 7d fc mov %edi,-0x4(%rbp) ;寄存器传递的参数也压栈,这与windows上相同
b: 89 75 f8 mov %esi,-0x8(%rbp)
e: 89 55 f4 mov %edx,-0xc(%rbp)
11: 89 4d f0 mov %ecx,-0x10(%rbp)
14: 44 89 45 ec mov %r8d,-0x14(%rbp)
18: 44 89 4d e8 mov %r9d,-0x18(%rbp)
1c: 8b 55 fc mov -0x4(%rbp),%edx
1f: 8b 45 f8 mov -0x8(%rbp),%eax
22: 01 c2 add %eax,%edx
24: 8b 45 f4 mov -0xc(%rbp),%eax
27: 01 c2 add %eax,%edx
29: 8b 45 f0 mov -0x10(%rbp),%eax
2c: 01 c2 add %eax,%edx
2e: 8b 45 ec mov -0x14(%rbp),%eax
31: 01 c2 add %eax,%edx
33: 8b 45 e8 mov -0x18(%rbp),%eax
36: 01 c2 add %eax,%edx
38: 8b 45 10 mov 0x10(%rbp),%eax
3b: 01 c2 add %eax,%edx
3d: 8b 45 18 mov 0x18(%rbp),%eax
40: 01 d0 add %edx,%eax
42: 5d pop %rbp
43: c3 retq

0000000000000044 <show>:
44: f3 0f 1e fa endbr64
48: 55 push %rbp
49: 48 89 e5 mov %rsp,%rbp
4c: 6a 08 pushq $0x8
4e: 6a 07 pushq $0x7
50: 41 b9 06 00 00 00 mov $0x6,%r9d ;确实使用了6个寄存器传递参数
56: 41 b8 05 00 00 00 mov $0x5,%r8d
5c: b9 04 00 00 00 mov $0x4,%ecx
61: ba 03 00 00 00 mov $0x3,%edx
66: be 02 00 00 00 mov $0x2,%esi
6b: bf 01 00 00 00 mov $0x1,%edi
70: e8 00 00 00 00 callq 75 <show+0x31>
75: 48 83 c4 10 add $0x10,%rsp
79: c9 leaveq
7a: c3 retq

000000000000007b <main>:
7b: f3 0f 1e fa endbr64
7f: 55 push %rbp
80: 48 89 e5 mov %rsp,%rbp
83: b8 00 00 00 00 mov $0x0,%eax
88: e8 00 00 00 00 callq 8d <main+0x12>
8d: b8 00 00 00 00 mov $0x0,%eax
92: 5d pop %rbp
93: c3 retq
1
2
func(para1,para2,para3,para4,para5,para6,para7,...,paran)
func(edi,esi,edx,ecx,r8d,r9d,栈上远离栈顶,...,栈上靠近栈顶)