Calling convention
In computer science, a calling convention is an implementation-level scheme for how subroutines receive parameters from their caller and how they return a result. Differences in various implementations include where parameters, return values, return addresses and scope links are placed, and how the tasks of preparing for a function call and restoring the environment afterward are divided between the caller and the callee.
Calling conventions may be related to a particular programming language's evaluation strategy but most often are not considered part of it, as the evaluation strategy is usually defined on a higher abstraction level and seen as a part of the language rather than as a low-level implementation detail of a particular language's compiler.
Variations
Calling conventions may differ in:- Where parameters, return values and return addresses are placed
- The order in which actual arguments for formal parameters are passed
- How a return value is delivered from the callee back to the caller
- How the task of setting up for and cleaning up after a function call is divided between the caller and the callee
- Whether and how metadata describing the arguments is passed
- Where the previous value of the frame pointer is stored, which is used to restore the frame pointer when the routine ends
- Where any static scope links for the routine's non-local data access are placed
- How local variables are allocated can sometimes also be part of the calling convention
- Conventions on which registers may be directly used by the callee, without being preserved
- Which registers are considered to be volatile and, if volatile, need not be restored by the callee
Compiler variation
Architecture variation
CPU architectures always have more than one possible calling convention. With many general-purpose registers and other features, the potential number of calling conventions is large, although some architectures are formally specified to use only one calling convention, supplied by the architect.x86 (32-bit)
The x86 architecture is used with many different calling conventions. Due to the small number of architectural registers, the x86 calling conventions mostly pass arguments on the stack, while the return value is passed in a register. Some conventions use registers for the first few parameters, which may improve performance for short and simple leaf-routines very frequently invoked.Example call:
push EAX ; pass some register result
push byte ; pass some memory variable
push 3 ; pass some constant
call calc ; the returned result is now in EAX
Typical callee structure:
calc:
push EBP ; save old frame pointer
mov EBP,ESP ; get new frame pointer
sub ESP,localsize ; reserve stack space for locals
.
. ; perform calculations, leave result in EAX
.
mov ESP,EBP ; free space for locals
pop EBP ; restore old frame pointer
ret paramsize ; free parameter space and return
ARM (A32)
The standard 32-bit ARM calling convention allocates the 15 general-purpose registers as:- r14 is the link register.
- r13 is the stack pointer.
- r12 is the Intra-Procedure-call scratch register.
- r4 to r11: used to hold local variables.
- r0 to r3: used to hold argument values passed to a subroutine, and also hold results returned from a subroutine.
If the type of value returned is too large to fit in r0 to r3, or whose size cannot be determined statically at compile time, then the caller must allocate space for that value at run time, and pass a pointer to that space in r0.
Subroutines must preserve the contents of r4 to r11 and the stack pointer. In particular, subroutines that call other subroutines must save the return address in the link register r14 to the stack before calling those other subroutines. However, such subroutines do not need to return that value to r14—they merely need to load that value into r15, the program counter, to return.
The ARM calling convention mandates using a full-descending stack.
This calling convention causes a "typical" ARM subroutine to:
- in the prologue, push r4 to r11 to the stack, and push the return address in r14 to the stack ;
- copy any passed arguments to the local scratch registers ;
- allocate other local variables to the remaining local scratch registers ;
- do calculations and call other subroutines as necessary using BL, assuming r0 to r3, r12 and r14 will not be preserved;
- put the result in r0;
- in the epilogue, pull r4 to r11 from the stack, and pull the return address to the program counter r15.
ARM (A64)
- x30 is the link register
- x29 is the frame register
- x19 to x29 are callee-saved
- x18 is the 'platform register', used for some operating-system-specific special purpose, or an additional caller-saved register
- x16 and x17 are the Intra-Procedure-call scratch register
- x9 to x15: used to hold local variables
- x8: used to hold indirect return value address
- x0 to x7: used to hold argument values passed to a subroutine, and also hold results returned from a subroutine
All registers starting with x have a corresponding 32-bit register prefixed with w. Thus, a 32-bit x0 is called w0.
PowerPC
The PowerPC architecture has a large number of registers so most functions can pass all arguments in registers for single level calls. Additional arguments are passed on the stack, and space for register-based arguments is also always allocated on the stack as a convenience to the called function in case multi-level calls are used and the registers must be saved. This is also of use in variadic functions, such asprintf
, where the function's arguments need to be accessed as an array. A single calling convention is used for all procedural languages.MIPS
The most commonly used calling convention for 32 bit MIPS is the O32 ABI which passes the first four arguments to a function in the registers $a0-$a3; subsequent arguments are passed on the stack. Space on the stack is reserved for $a0-$a3 in case the callee needs to save its arguments, but the registers are not stored there by the caller. The return value is stored in register $v0; a second return value may be stored in $v1. The 64 bit N64 ABI allows for more arguments in registers for more efficient function calls when there are more than four parameters. There is also the N32 ABI which also allows for more arguments in registers. The return address when a function is called is stored in the $ra register automatically by use of the JAL or JALR instructions.The function prologue of a MIPS subroutine pushes the return address to the stack.
The N32 and N64 ABIs pass the first eight arguments to a function in the registers $a0-$a7; subsequent arguments are passed on the stack. The return value is stored in the registers $v0; a second return value may be stored in $v1. In both the N32 and N64 ABIs all registers are considered to be 64-bits wide.
On both O32 and N32/N64 the stack grows downwards, but the N32/N64 ABIs require 64-bit alignment for all stack entries. The frame pointer is optional and in practice rarely used except when the stack allocation in a function is determined at runtime, for example, by calling
alloca
.For N32 and N64, the return address is typically stored 8 bytes before the stack pointer although this may be optional.
For the N32 and N64 ABIs, a function must preserve the $S0-$s7 registers, the global pointer, the stack pointer and the frame pointer. The O32 ABI is the same except the calling function is required to save the $gp register instead of the called function.
For multi-threaded code, the thread local storage pointer is typically stored in special hardware register $29 and is accessed by using the mfhw instruction. At least one vendor is known to store this information in the $k0 register which is normally reserved for kernel use, but this is not standard.
The $k0 and $k1 registers are reserved for kernel use and should not be used by applications since these registers can be changed at any time by the kernel due to interrupts, context switches or other events.
Name | Number | Use | Callee must preserve? |
$zero | $0 | constant 0 | |
$at | $1 | assembler temporary | |
$v0-$v1 | $2-$3 | values for function returns and expression evaluation | |
$a0-$a3 | $4-$7 | function arguments | |
$t0-$t7 | $8-$15 | temporaries | |
$s0-$s7 | $16-$23 | saved temporaries | |
$t8-$t9 | $24-$25 | temporaries | |
$k0-$k1 | $26-$27 | reserved for OS kernel | |
$gp | $28 | global pointer | |
$sp | $29 | stack pointer | |
$fp | $30 | frame pointer | |
$ra | $31 | return address |
Name | Number | Use | Callee must preserve? |
$zero | $0 | constant 0 | |
$at | $1 | assembler temporary | |
$v0-$v1 | $2-$3 | values for function returns and expression evaluation | |
$a0-$a7 | $4-$11 | function arguments | |
$t4-$t7 | $12-$15 | temporaries | |
$s0-$s7 | $16-$23 | saved temporaries | |
$t8-$t9 | $24-$25 | temporaries | |
$k0-$k1 | $26-$27 | reserved for OS kernel | |
$gp | $28 | global pointer | |
$sp | $29 | stack pointer | |
$s8 | $30 | frame pointer | |
$ra | $31 | return address |
Registers that are preserved across a call are registers that will not be changed by a system call or procedure call. For example, $s-registers must be saved to the stack by a procedure that needs to use them, and $sp and $fp are always incremented by constants, and decremented back after the procedure is done with them. By contrast, $ra is changed automatically by any normal function call, and $t-registers must be saved by the program before any procedure call.
The userspace calling convention of position-independent code on Linux additionally requires that when a function is called the $t9 register must contain the address of that function. This convention dates back to the System V ABI supplement for MIPS.
SPARC
The SPARC architecture, unlike most RISC architectures, is built on register windows. There are 24 accessible registers in each register window: 8 are the "in" registers, 8 are the "local" registers, and 8 are the "out" registers. The "in" registers are used to pass arguments to the function being called, and any additional arguments need to be pushed onto the stack. However, space is always allocated by the called function to handle a potential register window overflow, local variables, and returning a struct by value. To call a function, one places the arguments for the function to be called in the "out" registers; when the function is called, the "out" registers become the "in" registers and the called function accesses the arguments in its "in" registers. When the called function completes, it places the return value in the first "in" register, which becomes the first "out" register when the called function returns.The System V ABI, which most modern Unix-like systems follow, passes the first six arguments in "in" registers %i0 through %i5, reserving %i6 for the frame pointer and %i7 for the return address.
IBM System/360 and successors
The IBM System/360 is another architecture without a hardware stack. The examples below illustrate the calling convention used by OS/360 and successors prior to the introduction of 64-bit z/Architecture; other operating systems for System/360 might have different calling conventions.Calling program:
LA 1,ARGS Load argument list address
L 15,=A Load subroutine address
BALR 14,15 Branch to called routine1
...
ARGS DC A Address of 1st argument
DC A
...
DC A+X'80000000' Last argument2
Called program:
SUB EQU * This is the entry point of the subprogram
Standard entry sequence:
USING *,153
STM 14,12,12 Save registers4
ST 13,SAVE+4 Save caller's savearea addr
LA 12,SAVE Chain saveareas
ST 12,8
LR 13,12
...
Standard return sequence:
L 13,SAVE+45
LM 14,12,12
L 15,RETVAL6
BR 14 Return to caller
SAVE DS 18F Savearea7
Notes:
- The
BALR
instruction stores the address of the next instruction in the register specified by the first argument—register 14—and branches to the second argument address in register 15. - The caller passes the address of a list of argument addresses in register 1. The last address has the high-order bit set to indicate the end of the list. This limits programs using this convention to 31-bit addressing.
- The address of the called routine is in register 15. Normally this is loaded into another register and register 15 is not used as a base register.
- The
STM
instruction saves registers 14, 15, and 0 thru 12 in a 72-byte area provided by the caller called a save area pointed to by register 13. The called routine provides its own save area for use by subroutines it calls; the address of this area is normally kept in register 13 throughout the routine. The instructions followingSTM
update forward and backward chains linking this save area to the caller's save area. - The return sequence restores the caller's registers.
- Register 15 is usually used to pass a return value.
- Declaring a savearea statically in the called routine makes it non-reentrant and non-recursive; a reentrant program uses a dynamic savearea, acquired either from the operating system and freed upon returning, or in storage passed by the calling program.
- Registers 0 and 1 are volatile
- Registers 2 and 3 are used for parameter passing and return values
- Registers 4 and 5 are also used for parameter passing
- Register 6 is used for parameter passing, and must be saved and restored by the callee
- Registers 7 through 13 are for use by the callee, and must be saved and restored by them
- Register 14 is used for the return address
- Register 15 is used as the stack pointer
- Floating-point registers 0 and 2 are used for parameter passing and return values
- Floating-point registers 4 and 6 are for use by the callee, and must be saved and restored by them
- In z/Architecture, floating-point registers 1, 3, 5, and 7 through 15 are for use by the callee
- Access register 0 is reserved for system use
- Access registers 1 through 15 are for use by the callee
SuperH
Register | |||
R0 | Return values. Temporary for expanding assembly pseudo-instructions. Implicit source/destination for 8/16-bit operations. Not preserved. | Return value, caller saves | Variables/temporary. Not guaranteed |
R1..R3 | Serves as temporary registers. Not preserved. | Caller saved scratch. Structure address | Variables/temporary. Not guaranteed |
R4..R7 | First four words of integer arguments. The argument build area provides space into which R4 through R7 holding arguments may spill. Not preserved. | Parameter passing, caller saves | Arguments. Not guaranteed. |
R8..R13 | Serves as permanent registers. Preserved. | Callee Saves | Variables/temporary. Guaranteed. |
R14 | Default frame pointer. Preserved. | Frame Pointer, FP, callee saves | Variables/temporary. Guaranteed. |
R15 | Serves as stack pointer or as a permanent register. Preserved. | Stack Pointer, SP, callee saves | Stack pointer. Guaranteed. |
68k
The most common calling convention for the Motorola 68000 series is:- d0, d1, a0 and a1 are scratch registers
- All other registers are callee-saved
- a6 is the frame pointer, which can be disabled by a compiler option
- Parameters are pushed onto the stack, from right to left
- Return value is stored in d0
IBM 1130
CALL
to code non-relocatable subroutines directly linked with the main program, and LIBF
to call relocatable library subroutines through a transfer vector. Both pseudo-ops resolve to a Branch and Store IAR machine instruction that stores the address of the next instruction at its effective address and branches to EA+1.Arguments follow the
BSI
usually these are one-word addresses of argumentsthe called routine must know how many arguments to expect so that it can skip over them on return. Alternatively, arguments can be passed in registers. Function routines returned the result in ACC for real arguments, or in a memory location referred to as the Real Number Pseudo-Accumulator. Arguments and the return address were addressed using an offset to the IAR value stored in the first location of the subroutine.
* 1130 subroutine example
ENT SUB Declare "SUB" an external entry point
SUB DC 0 Reserved word at entry point, conventionally coded "DC *-*"
* Subroutine code begins here
* If there were arguments the addresses can be loaded indirectly from the return addess
LDX I 1 SUB Load X1 with the address of the first argument
...
* Return sequence
LD RES Load integer result into ACC
* If no arguments were provided, indirect branch to the stored return address
B I SUB If no arguments were provided
END SUB
Subroutines in IBM 1130, CDC 6600 and PDP-8 store the return address in the first location of a subroutine.
Implementation considerations
This variability must be considered when combining modules written in multiple languages, or when calling operating system or library APIs from a language other than the one in which they are written; in these cases, special care must be taken to coordinate the calling conventions used by caller and callee. Even a program using a single programming language may use multiple calling conventions, either chosen by the compiler, for code optimization, or specified by the programmer.Threaded code
Threaded code places all the responsibility for setting up for and cleaning up after a function call on the called code. The calling code does nothing but list the subroutines to be called. This puts all the function setup and cleanup code in one place—the prolog and epilog of the function—rather than in the many places that function is called. This makes threaded code the most compact calling convention.Threaded code passes all arguments on the stack. All return values are returned on the stack. This makes naive implementations slower than calling conventions that keep more values in registers. However, threaded code implementations that cache several of the top stack values in registers—in particular, the return address—are usually faster than subroutine calling conventions that always push and pop the return address to the stack.