The B3 compiler comprises two intermediate representations: a higher-level SSA-based representation called B3 IR and a lower-level representation that focuses of machine details, like registers. This lower-level form is called Air (Assembly Intermediate Representation).
Air programs are represented using a
Air::Code
object. Code
comprises an array of
basic
blocks. Each basic block comprises an array of
Inst
s.
Air has an explicit control flow graph: each basic block has predecessor and successor blocks.
Execution always begins at the first basic block (code[0]
). The Inst
s
in each block are executed in order. Each Inst
has an opcode, some flags, an
array of arguments
(Arg
s),
and an origin. The opcode and flags are wrapped in a Kind
, to make it
convenient to carry them around together. The origin is simply a B3 IR
Value
.
Some opcodes use the origin for additional meta-data. This works because Air code always
coexists with the B3 procedure from which it was generated.
This document begins by describing the philosophy of Air. The behavior of Arg
s is
central to Air's execution model, which is described in the section that follows. The last
section describes the way Air opcodes are defined.
B3 is designed to be portable to many kinds of CPUs. Currently, it supports x86-64 and ARM64, which are quite different from each other. In B3 IR, we expose very few instruction set details. It's a goal of B3 IR to ensure that B3 values behave the same way except when the alternative would be counterproductive (like with pointer size, the corner-case behaviors of division, or calling convention customization). But to effectively compile code to different CPUs, the compiler has to eventually make instruction set details explicit. This is where Air comes in. B3 locks in most CPU-specific details at the moment of conversion to Air, and the Air code is irreversibly tied to some specific CPU.
Air is an instruction superset: it recognizes all of the instructions from all CPUs
that Air may target. In its lowest-level form, Air is simply a way of describing an assembly
instruction sequence, and this includes CPU concepts like registers and direct accesses to the
stack. Air also has a higher-level form in which the assembly has not yet undergone register
or stack allocation. Therefore, Air also supports abstract registers (called
Tmp
s)
and abstract
stack
slots. A Tmp
object can either hold an unallocated temporary or a
register.
Air has syntax to speak of all of the CPU instructions we know about. It is possible to speak of an x86-64 instruction while compiling for ARM64, for example. Clients of Air, such as the B3 to Air lowering phase, are allowed to pick any Air opcode and ask if that opcode would be valid on the current CPU. They are also allowed to check if specific forms of any given opcode are valid. This allows clients to optimize for multiple instruction sets by cascading through the possible opcodes that they know of, starting with the one they think is most efficient. Some of those opcodes may only be available on one CPU while others are available everywhere. Instruction selection does not need to know which instructions work on which CPUs; Air will tell you if some instruction happens to not be valid right now for whatever reason.
Air opcodes support overloading. For example, the Add32 opcode has both two-operand and
three-operand overloads, and those overloads have multiple forms: the first
operand may or may not be permitted to be an immediate and depending on the CPU and some of the
other operands may or may not be allowed to be memory addresses. We use opcode overload
to refer to all forms of an opcode that share the same number of arguments, and opcode
form to mean the number of arguments and their types. A fundamental Air operation is
Inst::isValidForm()
, which tells the client if the instruction's current form is
valid on the current CPU. This may return false either because the Inst is not well-formed for
any CPU or because it is not valid for the current CPU even though it may be valid on some
other CPU. There is also Air::isValidForm()
, which can answer if the form you are
intending to use will be valid even if you have not created an Inst
yet. This
allows clients to generate Air by experimenting with different forms before settling on the one
that the current CPU supports.
Air doesn't require the client to perform register or stack allocation. Anywhere that Air
accepts a register it will also accept a Tmp
. Anywhere that Air accepts an address
it will also accept a stack slot reference. Air code generation comprises a register allocator
and a stack allocator, which turn Tmp
s into Reg
s and stack slots into
addresses with the frame pointer (or stack pointer) as the base and some integer offset. Air
allows clients to speak of registers directly even while also using Tmp
s, and the
register allocator will ensure that it avoids clobbering registers that the client code already
relies upon. This is possible because Air has precise modeling of how instructions use
registers, so it's always possible to determine which registers are live at any point in the
Air code.
Air's philosophy allows B3 to use it for converting high-level, mostly-CPU-agnostic SSA
procedures into code for the current CPU. Air is an instruction superset that allows clients to
consider all available instructions on all possible CPUs and query which forms of those
instructions are available on the current CPU. Air also supports for high-level concepts like
Tmp
s and stack slots, which allows B3 to Air lowering to focus on which
instructions to use without worrying about register allocation or stack layout.
Air can be thought of as an
orthogonal instruction
set. It's possible to construct an Inst
with any combination of opcode and
arguments. The opcode determines what Air will do to the arguments - it may read from them or
write to them, for example. Orthognality implies that any argument that is read may be either
a register (or Tmp
), an address, or an immediate; while any argument that is
written may be either a register or an address. Air constrains orthognality where the target
CPU would. For example, none of Air's target CPUs would support an Add32
instruction that loads its sources from memory and stores its result into memory. Even
x86 doesn't go that far. Either before or after creating an Inst
, the client can
query if a particular combination of arguments (for example, three memory addresses) would be
valid for a given opcode (for example, Add32
).
Air arguments are represented using the Arg
object. Arg
can
represent any of the following assembly operands:
Tmp
represents either a register or a temporary.Tmp
for the base and an immediate for the offset. Stack and CallArg are abstract stack offsets.
Index is a base-index address, which has a pair of Tmp
s (base and index) as well
as an offset immediate and a scaling immediate for the index.Inst
s to point to additional meta-data. The Special
argument type is used for such meta-data. It holds a Arg::Special*
.Width8
, Width16
, Width32
, and
Width64
.The opcode of an Inst
combined with the overload - i.e. the number of arguments
- determines what the Inst
will do to each argument. The behavior of arguments
comes down to three dimensions that are determined by the opcode and overload:
GP
(general purpose) and FP
(floating
point). Arguments also have type. It's important to remember that there is both a type for
the argument as determined by the opcode and overload, and a type of the argument itself.
Some arguments are untyped, which means that they may be used regardless of the type desired
by the opcode/overload. For example, addresses are untyped. Other arguments have specific
type, like registers and Tmp
s. Except for BigImm
, immediates are
GP
.Width8
, Width16
, Width32
, and
Width64
.The timing of an argument role is important, and brings us to the order of execution of an
Inst
. Each Inst
can be thought of as executing three
steps:
The early actions of one instruction happen immediately after the late actions of the instruction before it. However, many Air analyses view them as happening at the same time. For example, any register usage in the early action of one instruction interferes with the register usage in the late action of the instruction that came before it. All of Air's liveness and interference analyses reason about the fence posts between instructions, where the late actions of the previous instruction and the early actions of the next form an interference clique.
Let's consider a simple example, like Add32
with two arguments. Let's say that
the first argument is a memory location and the second argument is a register. Air uses the
AT&T style of
placing the destination argument last for most instructions. This Add32 loads from memory and
adds the value to the register. Air writes this as:
Add32 42(%rax), %rcx
This instruction will proceed in three steps:
%rax
. The result is
stored in an internal, hidden CPU location for the remainder of execution. Even if the
instruction later stores to memory and overwrites this value, Add32
will still
use the original value it had loaded. We say that this is an early use. At the same
time, the CPU will load the value of %rcx
and store it in a hidden CPU
location. This is also an early use.%rcx
. This is a late
def with zero extension.Therefore, the two-argument overload of Add32
ascribes the following to its
arguments:
42(%rax)
) and
early warm use with late warm def with zero extension for the second argument
(%rcx
). Early warm use is written as Use
for short. Early warm
use with late warm def with zero extension is written as UseZDef
for
short.GP
. This matches 42(%rax)
because
addresses match any type. This matches %rcx
because it's a general-purpose
register.Width32
. Combined with UseZDef
,
this means that the instruction will read the low 32 bits of %rcx
in the early
actions and it will store to all bits of %rcx
in the late actions, but it
will ensure that all but the low 32 bits are zero.Air can tell you what role, type, and width is ascribed to each argument by using the
Inst::forEachArg(func)
operation. It takes a callback of type
void(Arg&, Arg::Role, Arg::Type, Arg::Width)
. For our Add32
example,
this callback would get called twice:
func(42(%rax), Use, GP, Width32)
func(%rcx, UseZDef, GP, Width32)
It's important to remember that Air's summaries of what instructions do to arguments are not meant to be exhaustive. For example, if an instruction claims to use an address, this tells you that the instruction will perform a load but it tells you nothing about how that load will be performed. This means that unless you know exactly what it means to use/def an argument, you cannot perform this transformation:
Foo32 (%rax)
Move32 (%rax), %tmp
Foo32 %tmp
Even if you know that Foo32 only uses its argument, you cannot do this because Move32 may not load from the address using exactly the same kind of load that Foo32 would have done. Memory accesses have many dimensions of options: alignment semantics (if you're lucky then misaligned accesses run fast but sometimes they ignore low bits, trap if unaligned, or run super slow when unaligned, and this behavior may depend on the opcode), temporality and memory ordering, determinism of trapping, etc. Just seeing that an instruction uses an address does not tell you what kind of load will happen, and currently Air does not have the ability to answer such questions. Fortunately, Air does not have much need to move memory accesses out of instructions. Uses and defs of temporaries, registers, immediates, and spill slots don't have these caveats and so those arguments can be moved from one instruction to another without worries.
Air's introspection of Inst
s tends to be quite fast thanks to the use of template
specialization and C++ lambdas. The forEachArg()
template method uses an efficient
arrangement of switch statements to determine the opcode and overload. If func
is
a C++ lambda, we expect forEachArg()
to be specialized for that lambda. Therefore,
this idiom avoids virtual dispatch or memory allocation.
Air supports exotic roles, such as late uses and early defs. There is even the
Scratch
role, which means early def and late use. Speaking of a Tmp
in the Scratch
role means that the Tmp
will be assigned a register
that is guaranteed to not interfere with any of the other registers that the instruction
speaks of. Late uses and early defs are crucial for patchpoints, which may require
that one of the incoming values be given a register that does not interfere with whatever
register is used for the result. This can be expressed either as giving the inputs a late use
role or by giving the outputs an early def role. The full list of possible roles is:
UseAddr
is interesting for the Lea
(load effective address)
instruction, which evaluates the address and places the result into a temporary or register.
The argument must be an address, but UseAddr
means that we don't actually read
from the address. Note that using an address in any of the other roles always implies that the
components of the address are used early and warm (i.e. Use
).
Air arguments are central to Air's execution model. The early and late actions of an
instruction have to do with arguments, and what happens to each argument during the early and
late actions is determined by the opcode and the number of arguments (i.e. the overload).
Clients of Air may create an Inst
with any combination of opcode and arguments
and then query, using isValidForm()
if the opcode, overload, and specific
arguments are valid for the current CPU.
Air has many opcodes and the opcodes have many different overloads and forms. Air makes it
easy to reason about all of them with helpers like isValidForm()
and
forEachArg()
. It also provides a Inst::generate()
function that will
generate code for the instruction, provided that it does not use any non-register
Tmp
s or any abstract stack slots. If we wrote the
code for validating, iterating, and generating each form by hand, we would have a bad time.
For this reason, Air comes with an
opcode code generator
that uses a
opcode definition file
as input. The opcode definition file use a simple and concise syntax that lets us define many
opcodes at once and constrain them to the CPU kinds that support them. Additionally, Air
supports custom opcodes, where the code generator emits calls to
hand-written
C++ code. This section describes the opcode definition language.
It's easiest to understand the opcode definitions with an example. Let's use the two-argument
overload of Add32
.
Add32 U:G:32, UZD:G:32
Tmp, Tmp
x86: Imm, Addr
x86: Imm, Index
Imm, Tmp
x86: Addr, Tmp
x86: Tmp, Addr
x86: Tmp, Index
The first line defines the overload. It has two arguments. The first argument serves the
Use
role, shorted as U
. It is general-purpose, shortened as
G
. It has a 32-bit width. Hence the string U:G:32
. Similarly,
UZD:G:32
means UseZDef
, GP
, Width32
.
The next lines list the available forms of the overload. A form is a list of possible kinds of
arguments. These use the same terminology for Arg
kinds from the previous section,
with the caveat that Addr
implies that Addr
, Stack
, or
CallArg
would be accepted.
Prefixing any line with x86:
means that this form is only available on x86
CPUs, such as x86 or x86-64.
Air opcodes are designed to work with JavaScriptCore's existing MacroAssembler. By default, an
opcode is automatically given a code generator that calls
MacroAssembler::opcodeName
, where opcodeName is derived by
lower-casing the first letter of the Air opcode name. Add32
becomes
MacroAssembler::add32
, for example.
See the header of AirOpcode.opcodes for a complete list of shorthand used by Air's opcode definition language.
Air is designed around JavaScriptCore's existing MacroAssembler. Air has Inst objects, which each describe some method call to the MacroAssembler: an Inst's opcode indicates which method name to use and its args indicate the arguments to pass to that method. We use code generation to create a massive switch statement that turns the reflective Insts into actual calls to MacroAssembler. Consequently, we can "add" new instructions to Air usually by just editing the AirOpcode.opcodes file.