The concept of an FFI is fairly straightforward: How does a language/platform "bind" (in other words, be able to call) to underlying APIs of the host environment? Examples include Java code, running on top of the JVM, being able to "call out to" native C code, such as operating system APIs or native libraries; game engines being able to call out to libraries that aren't a part of the game engine itself; and so on.

Most of the time this requires several pieces of knowledge:

Dynamically loading libraries

Most FFIs (particularly those of VM-based languages) use dynamic loading of libraries, so as to reduce the need to statically-link new executables each time a new FFI invocation is needed/desired.

Lots of this material is covered in linking and loading reading. Typically the two-step process is to:

  1. Load the library. Asking the OS to bring the library up into the process' space and make it available.
  2. Request the address of a given exported symbol by name. In other words, look up the function by its name, returning either the address (which we can then coerce somehow into a callable reference--function pointer, if you will) or NULL if no such name is found.

Call signatures

How do we name the exported entry point? Does the entry point have any metadata describing the parameters and/or return type?

C

C generally mapped the name of a function directly to the exported ABI name, usually with a prefixed _. (Not sure of the reason for that prefix, to be honest.) "an identifier beginning with an underscore followed by a capital letter is a reserved identifier in C, so conflict with user identifiers is avoided"

C++

... was where things got really interesting. (Inside the C++ Object Model had/has a lot of details on this.) In order to support function overloading (same name, different parameters), C++ generated C-style function names with the parameters encoded as part of the name; example:

int  f () { return 1; }
int  f (int)  { return 0; }
void g () { int i = f(), j = f(0); }

... could produce ...

int  __f_v () { return 1; }
int  __f_i (int)  { return 0; } 
void __g_v () { int i = __f_v(), j = __f_i(0); }

and it got even more interesting for classes:

namespace wikipedia 
{
   class article 
   {
   public:
      std::string format ();  // = _ZN9wikipedia7article6formatEv

      bool print_to (std::ostream&);  // = _ZN9wikipedia7article8print_toERSo

      class wikilink 
      {
      public:
         wikilink (std::string const& name);  // = _ZN9wikipedia7article8wikilinkC1ERKSs
      };
   };
}

... depending on the precise name-mangling convention for that C++ compiler:

Compiler | void h(int) | void h(int, char) | void h(void)
-------- + ----------- + ----------------- + ------------
Intel C++ 8.0 for Linux, HP aC++ A.05.55 IA-64, IAR EWARM C++, GCC 3.x and higher, Clang 1.x and higher | _Z1hi | _Z1hic | _Z1hv
GCC 2.9.x, HP aC++ A.03.45 PA-RISC | h__Fi | h__Fic | h__Fv
Microsoft Visual C++ v6-v10 (mangling details), Digital Mars C++ | ?h@@YAXH@Z | ?h@@YAXHD@Z | ?h@@YAXXZ
Borland C++ v3.1 | @h$qi | @h$qizc | @h$qv
OpenVMS C++ v6.5 (ARM mode) | H__XI | H__XIC | H__XV
OpenVMS C++ v6.5 (ANSI mode) | | CXX$__7H__FIC26CDH77 | CXX$__7H__FV2CB06E8
OpenVMS C++ X7.1 IA-64 | CXX$_Z1HI2DSQ26A | CXX$_Z1HIC2NP3LI4 | CXX$_Z1HV0BCA19V
SunPro CC | __1cBh6Fi_v_ | __1cBh6Fic_v_ | __1cBh6F_v_
Tru64 C++ v6.5 (ARM mode) | h__Xi | h__Xic | h__Xv
Tru64 C++ v6.5 (ANSI mode) | __7h__Fi | __7h__Fic | __7h__Fv
Watcom C++ 10.6 | W?h$n(i)v | W?h$n(ia)v | W?h$n()v

Readings

Software

Talks

Obj-C

Two forms of method in Objective-C, the class ("static") method, and the instance method. A method declaration in Objective-C is of the following form:

+ (return-type) name0:parameter0 name1:parameter1 ...
– (return-type) name0:parameter0 name1:parameter1 ...

Class methods are signified by +, instance methods use -. A typical class method declaration may then look like:

+ (id) initWithX: (int) number andY: (int) number;
+ (id) new;

With instance methods looking like this:

- (id) value;
- (id) setValue: (id) new_value;

Each of these method declarations have a specific internal representation. When compiled, each method is named according to the following scheme for class methods:

_c_Class_name0_name1_ ...

and this for instance methods:

_i_Class_name0_name1_ ...

The colons in the Objective-C syntax are translated to underscores. So, the Objective-C class method + (id) initWithX: (int) number andY: (int) number;, if belonging to the Point class would translate as _c_Point_initWithX_andY_, and the instance method (belonging to the same class) - (id) value; would translate to _i_Point_value.

Each of the methods of a class are labeled in this way. However, in order to look up a method that a class may respond to would be tedious if all methods are represented in this fashion. Each of the methods is assigned a unique symbol (such as an integer). Such a symbol is known as a selector. In Objective-C, one can manage selectors directly — they have a specific type in Objective-C — SEL.

During compilation, a table is built that maps the textual representation, such as _i_Point_value, to selectors. Managing selectors is more efficient than manipulating the textual representation of a method. Note that a selector only matches a method's name, not the class it belongs to — different classes can have different implementations of a method with the same name. Because of this, implementations of a method are given a specific identifier too, these are known as implementation pointers, and are also given a type, IMP.

Message sends are encoded by the compiler as calls to the id objc_msgSend (id receiver, SEL selector, ...) function, or one of its cousins, where receiver is the receiver of the message, and SEL determines the method to call. Each class has its own table that maps selectors to their implementations — the implementation pointer specifies where in memory the actual implementation of the method resides. There are separate tables for class and instance methods. Apart from being stored in the SEL to IMP lookup tables, the functions are essentially anonymous.

The SEL value for a selector does not vary between classes. This enables polymorphism.

The Objective-C runtime maintains information about the argument and return types of methods. However, this information is not part of the name of the method, and can vary from class to class.

Since Objective-C does not support namespaces, there is no need for the mangling of class names (that do appear as symbols in generated binaries).

Swift

Swift keeps metadata about functions (and more) in the mangled symbols referring to them. This metadata includes the function's name, attributes, module name, parameter types, return type, and more. For example:

The mangled name for a method func calculate(x: int) -> int of a MyClass class in module test is _TFC4test7MyClass9calculatefS0_FT1xSi_Si, for 2014 Swift. The components and their meanings are as follows:

Mangling for versions since Swift 4.0 is documented officially.

Calling conventions

An implementation-level (low-level) scheme for how subroutines receive parameters from their caller and how they return a result. Differences in various implementations include where parameters, return values, return addresses and scope links are placed (registers, stack or memory etc.), and how the tasks of preparing for a function call and restoring the environment afterwards are divided between the caller and the callee.

Calling conventions may differ in:

In some cases, differences also include the following:

x86

Due to the small number of architectural registers, and historical focus on simplicity and small code-size, many x86 calling conventions pass arguments on the stack. The return value (or a pointer to it) is returned in a register. Some conventions use registers for the first few parameters which may improve performance, especially for short and simple leaf-routines very frequently invoked (i.e. routines that do not call other routines).

Example caller:

push EAX            ; pass some register result
push dword [EBP+20] ; pass some memory variable (FASM/TASM syntax)
push 3              ; pass some constant
call calc           ; the returned result is now in EAX

Example callee (calc):

calc:
push EBP            ; save old frame pointer
mov EBP,ESP         ; get new frame pointer
sub ESP,localsize   ; reserve stack space for locals

; perform calculations, leave result in EAX

mov ESP,EBP         ; free space for locals
pop EBP             ; restore old frame pointer
ret paramsize       ; free parameter space and return.

Some conventions leave the parameter space allocated, using plain ret instead of ret imm16. In that case, the caller could add esp,12 in this example, or otherwise deal with the change to ESP.

ARM (A32)

The standard 32-bit ARM calling convention allocates the 15 general-purpose registers as:

If the type of value returned is too large to fit in r0 to r3, or whose size cannot be determined statically at compile time, then the caller must allocate space for that value at run time, and pass a pointer to that space in r0.

Subroutines must preserve the contents of r4 to r11 and the stack pointer (perhaps by saving them to the stack in the function prologue, then using them as scratch space, then restoring them from the stack in the function epilogue). In particular, subroutines that call other subroutines must save the return address in the link register r14 to the stack before calling those other subroutines. However, such subroutines do not need to return that value to r14—they merely need to load that value into r15, the program counter, to return.

The ARM calling convention mandates using a full-descending stack. (Reference)

This calling convention causes a "typical" ARM subroutine to:

ARM (A64)

The AArch 64 calling convention allocates the 31 general-purpose registers as:

All registers starting with x have a corresponding 32-bit register prefixed with w. Thus, a 32-bit x0 is called w0.

Similarly, the 32 floating-point registers are allocated as:

Known calling conventions

References:

"cdecl"

"Pascal" style

Windows "stdcall"

macOS ABI

Outdated: OS X ABI Function Call Guide covers 32-/64-bit PowerPC, and IA-32 / x86-64 calling conventions

Application Binary Interfaces (ABI) in general

Software

Talks


Bash


Stata


Tags: language   reading   windows   macos   linux   android   ios  

Last modified 06 April 2022