Bare Bones Backend / Assembly Intermediate Representation

The B3 compiler comprises two intermediate representations: a higher-level SSA-based representation called B3 IR and a lower-level representation that focuses of machine details, like registers. This lower-level form is called Air (Assembly Intermediate Representation).

Air programs are represented using a Air::Code object. Code comprises an array of basic blocks. Each basic block comprises an array of Insts. Air has an explicit control flow graph: each basic block has predecessor and successor blocks. Execution always begins at the first basic block (code[0]). The Insts in each block are executed in order. Each Inst has an opcode, some flags, an array of arguments (Args), and an origin. The opcode and flags are wrapped in a Kind, to make it convenient to carry them around together. The origin is simply a B3 IR Value. Some opcodes use the origin for additional meta-data. This works because Air code always coexists with the B3 procedure from which it was generated.

This document begins by describing the philosophy of Air. The behavior of Args is central to Air's execution model, which is described in the section that follows. The last section describes the way Air opcodes are defined.

Philosophy of Air

B3 is designed to be portable to many kinds of CPUs. Currently, it supports x86-64 and ARM64, which are quite different from each other. In B3 IR, we expose very few instruction set details. It's a goal of B3 IR to ensure that B3 values behave the same way except when the alternative would be counterproductive (like with pointer size, the corner-case behaviors of division, or calling convention customization). But to effectively compile code to different CPUs, the compiler has to eventually make instruction set details explicit. This is where Air comes in. B3 locks in most CPU-specific details at the moment of conversion to Air, and the Air code is irreversibly tied to some specific CPU.

Air is an instruction superset: it recognizes all of the instructions from all CPUs that Air may target. In its lowest-level form, Air is simply a way of describing an assembly instruction sequence, and this includes CPU concepts like registers and direct accesses to the stack. Air also has a higher-level form in which the assembly has not yet undergone register or stack allocation. Therefore, Air also supports abstract registers (called Tmps) and abstract stack slots. A Tmp object can either hold an unallocated temporary or a register.

Air as an Instruction Superset

Air has syntax to speak of all of the CPU instructions we know about. It is possible to speak of an x86-64 instruction while compiling for ARM64, for example. Clients of Air, such as the B3 to Air lowering phase, are allowed to pick any Air opcode and ask if that opcode would be valid on the current CPU. They are also allowed to check if specific forms of any given opcode are valid. This allows clients to optimize for multiple instruction sets by cascading through the possible opcodes that they know of, starting with the one they think is most efficient. Some of those opcodes may only be available on one CPU while others are available everywhere. Instruction selection does not need to know which instructions work on which CPUs; Air will tell you if some instruction happens to not be valid right now for whatever reason.

Air opcodes support overloading. For example, the Add32 opcode has both two-operand and three-operand overloads, and those overloads have multiple forms: the first operand may or may not be permitted to be an immediate and depending on the CPU and some of the other operands may or may not be allowed to be memory addresses. We use opcode overload to refer to all forms of an opcode that share the same number of arguments, and opcode form to mean the number of arguments and their types. A fundamental Air operation is Inst::isValidForm(), which tells the client if the instruction's current form is valid on the current CPU. This may return false either because the Inst is not well-formed for any CPU or because it is not valid for the current CPU even though it may be valid on some other CPU. There is also Air::isValidForm(), which can answer if the form you are intending to use will be valid even if you have not created an Inst yet. This allows clients to generate Air by experimenting with different forms before settling on the one that the current CPU supports.

Air as a High-Level Assembly

Air doesn't require the client to perform register or stack allocation. Anywhere that Air accepts a register it will also accept a Tmp. Anywhere that Air accepts an address it will also accept a stack slot reference. Air code generation comprises a register allocator and a stack allocator, which turn Tmps into Regs and stack slots into addresses with the frame pointer (or stack pointer) as the base and some integer offset. Air allows clients to speak of registers directly even while also using Tmps, and the register allocator will ensure that it avoids clobbering registers that the client code already relies upon. This is possible because Air has precise modeling of how instructions use registers, so it's always possible to determine which registers are live at any point in the Air code.

Air's philosophy allows B3 to use it for converting high-level, mostly-CPU-agnostic SSA procedures into code for the current CPU. Air is an instruction superset that allows clients to consider all available instructions on all possible CPUs and query which forms of those instructions are available on the current CPU. Air also supports for high-level concepts like Tmps and stack slots, which allows B3 to Air lowering to focus on which instructions to use without worrying about register allocation or stack layout.

Args and the Air Execution Model

Air can be thought of as an orthogonal instruction set. It's possible to construct an Inst with any combination of opcode and arguments. The opcode determines what Air will do to the arguments - it may read from them or write to them, for example. Orthognality implies that any argument that is read may be either a register (or Tmp), an address, or an immediate; while any argument that is written may be either a register or an address. Air constrains orthognality where the target CPU would. For example, none of Air's target CPUs would support an Add32 instruction that loads its sources from memory and stores its result into memory. Even x86 doesn't go that far. Either before or after creating an Inst, the client can query if a particular combination of arguments (for example, three memory addresses) would be valid for a given opcode (for example, Add32).

Air arguments are represented using the Arg object. Arg can represent any of the following assembly operands:

Tmp
A Tmp represents either a register or a temporary.
Imm, BigImm, BitImm, and BitImm64
These are various kinds of immediates. We distinguish between big and small immediates because some instructions only allow immediates within a certain range. We distinguish between immediates for bit operations and immediates for all other operations because ARM has different constraints on the values of immediates depending on whether they are used for bit math.
Addr, Stack, CallArg, and Index
These are all memory addresses. Addr is a base-offset address, which uses a Tmp for the base and an immediate for the offset. Stack and CallArg are abstract stack offsets. Index is a base-index address, which has a pair of Tmps (base and index) as well as an offset immediate and a scaling immediate for the index.
RelCond, ResCond, and DoubleCond
These are condition codes for various kinds of branches.
Special
Air allows certain Insts to point to additional meta-data. The Special argument type is used for such meta-data. It holds a Arg::Special*.
WidthArg
Some special Air opcodes take operands that describe the width of the operation. Possible values are Width8, Width16, Width32, and Width64.

The opcode of an Inst combined with the overload - i.e. the number of arguments - determines what the Inst will do to each argument. The behavior of arguments comes down to three dimensions that are determined by the opcode and overload:

Role
The role of an argument is an enum that describes the timing of when the argument is accessed, how it's accessed (use, a.k.a. read; or def, a.k.a write), how important that access is for performance (either warm or cold), and how writes affect the top bits (either ignores them or zero-fills them). The timing of an argument role is discussed further below. The performance requirements of an argument are used for register allocation prioritization. A warm argument is counted towards the register allocation priority heuristic, while a cold one isn't.
Type
Air recognizes two types, GP (general purpose) and FP (floating point). Arguments also have type. It's important to remember that there is both a type for the argument as determined by the opcode and overload, and a type of the argument itself. Some arguments are untyped, which means that they may be used regardless of the type desired by the opcode/overload. For example, addresses are untyped. Other arguments have specific type, like registers and Tmps. Except for BigImm, immediates are GP.
Width
The amount of bits, starting at lowest-order, that the instruction affects. Possible values are Width8, Width16, Width32, and Width64.

The timing of an argument role is important, and brings us to the order of execution of an Inst. Each Inst can be thought of as executing three steps:

  1. Perform early actions.
  2. Perform primary actions.
  3. Perform late actions.

The early actions of one instruction happen immediately after the late actions of the instruction before it. However, many Air analyses view them as happening at the same time. For example, any register usage in the early action of one instruction interferes with the register usage in the late action of the instruction that came before it. All of Air's liveness and interference analyses reason about the fence posts between instructions, where the late actions of the previous instruction and the early actions of the next form an interference clique.

Let's consider a simple example, like Add32 with two arguments. Let's say that the first argument is a memory location and the second argument is a register. Air uses the AT&T style of placing the destination argument last for most instructions. This Add32 loads from memory and adds the value to the register. Air writes this as:

Add32 42(%rax), %rcx

This instruction will proceed in three steps:

  1. Load the value at offset 42 from the memory address in %rax. The result is stored in an internal, hidden CPU location for the remainder of execution. Even if the instruction later stores to memory and overwrites this value, Add32 will still use the original value it had loaded. We say that this is an early use. At the same time, the CPU will load the value of %rcx and store it in a hidden CPU location. This is also an early use.
  2. Add the two values together. Store the result in hidden CPU location.
  3. Zero-extend the resulting value and store it into %rcx. This is a late def with zero extension.

Therefore, the two-argument overload of Add32 ascribes the following to its arguments:

Air can tell you what role, type, and width is ascribed to each argument by using the Inst::forEachArg(func) operation. It takes a callback of type void(Arg&, Arg::Role, Arg::Type, Arg::Width). For our Add32 example, this callback would get called twice:

  1. func(42(%rax), Use, GP, Width32)
  2. func(%rcx, UseZDef, GP, Width32)

It's important to remember that Air's summaries of what instructions do to arguments are not meant to be exhaustive. For example, if an instruction claims to use an address, this tells you that the instruction will perform a load but it tells you nothing about how that load will be performed. This means that unless you know exactly what it means to use/def an argument, you cannot perform this transformation:

Even if you know that Foo32 only uses its argument, you cannot do this because Move32 may not load from the address using exactly the same kind of load that Foo32 would have done. Memory accesses have many dimensions of options: alignment semantics (if you're lucky then misaligned accesses run fast but sometimes they ignore low bits, trap if unaligned, or run super slow when unaligned, and this behavior may depend on the opcode), temporality and memory ordering, determinism of trapping, etc. Just seeing that an instruction uses an address does not tell you what kind of load will happen, and currently Air does not have the ability to answer such questions. Fortunately, Air does not have much need to move memory accesses out of instructions. Uses and defs of temporaries, registers, immediates, and spill slots don't have these caveats and so those arguments can be moved from one instruction to another without worries.

Air's introspection of Insts tends to be quite fast thanks to the use of template specialization and C++ lambdas. The forEachArg() template method uses an efficient arrangement of switch statements to determine the opcode and overload. If func is a C++ lambda, we expect forEachArg() to be specialized for that lambda. Therefore, this idiom avoids virtual dispatch or memory allocation.

Air supports exotic roles, such as late uses and early defs. There is even the Scratch role, which means early def and late use. Speaking of a Tmp in the Scratch role means that the Tmp will be assigned a register that is guaranteed to not interfere with any of the other registers that the instruction speaks of. Late uses and early defs are crucial for patchpoints, which may require that one of the incoming values be given a register that does not interfere with whatever register is used for the result. This can be expressed either as giving the inputs a late use role or by giving the outputs an early def role. The full list of possible roles is:

Use
Early warm use.
ColdUse
Early cold use.
LateUse
Late warm use.
LateColdUse
Late cold use.
Def
Late def. Note that all defs are warm.
ZDef
Late def with zero-extension.
UseDef
Early warm use and late def.
UseZDef
Early warm use and late def with zero extension.
EarlyDef
Early def.
Scratch
Early def and late warm use.
UseAddr
Early warm use of the address's components.

UseAddr is interesting for the Lea (load effective address) instruction, which evaluates the address and places the result into a temporary or register. The argument must be an address, but UseAddr means that we don't actually read from the address. Note that using an address in any of the other roles always implies that the components of the address are used early and warm (i.e. Use).

Air arguments are central to Air's execution model. The early and late actions of an instruction have to do with arguments, and what happens to each argument during the early and late actions is determined by the opcode and the number of arguments (i.e. the overload). Clients of Air may create an Inst with any combination of opcode and arguments and then query, using isValidForm() if the opcode, overload, and specific arguments are valid for the current CPU.

Defining Air

Air has many opcodes and the opcodes have many different overloads and forms. Air makes it easy to reason about all of them with helpers like isValidForm() and forEachArg(). It also provides a Inst::generate() function that will generate code for the instruction, provided that it does not use any non-register Tmps or any abstract stack slots. If we wrote the code for validating, iterating, and generating each form by hand, we would have a bad time. For this reason, Air comes with an opcode code generator that uses a opcode definition file as input. The opcode definition file use a simple and concise syntax that lets us define many opcodes at once and constrain them to the CPU kinds that support them. Additionally, Air supports custom opcodes, where the code generator emits calls to hand-written C++ code. This section describes the opcode definition language.

It's easiest to understand the opcode definitions with an example. Let's use the two-argument overload of Add32.

Add32 U:G:32, UZD:G:32
    Tmp, Tmp
    x86: Imm, Addr
    x86: Imm, Index
    Imm, Tmp
    x86: Addr, Tmp
    x86: Tmp, Addr
    x86: Tmp, Index

The first line defines the overload. It has two arguments. The first argument serves the Use role, shorted as U. It is general-purpose, shortened as G. It has a 32-bit width. Hence the string U:G:32. Similarly, UZD:G:32 means UseZDef, GP, Width32.

The next lines list the available forms of the overload. A form is a list of possible kinds of arguments. These use the same terminology for Arg kinds from the previous section, with the caveat that Addr implies that Addr, Stack, or CallArg would be accepted.

Prefixing any line with x86: means that this form is only available on x86 CPUs, such as x86 or x86-64.

Air opcodes are designed to work with JavaScriptCore's existing MacroAssembler. By default, an opcode is automatically given a code generator that calls MacroAssembler::opcodeName, where opcodeName is derived by lower-casing the first letter of the Air opcode name. Add32 becomes MacroAssembler::add32, for example.

See the header of AirOpcode.opcodes for a complete list of shorthand used by Air's opcode definition language.

Summary

Air is designed around JavaScriptCore's existing MacroAssembler. Air has Inst objects, which each describe some method call to the MacroAssembler: an Inst's opcode indicates which method name to use and its args indicate the arguments to pass to that method. We use code generation to create a massive switch statement that turns the reflective Insts into actual calls to MacroAssembler. Consequently, we can "add" new instructions to Air usually by just editing the AirOpcode.opcodes file.