Web High Level Shading Language

This article is introducing a new graphics shading language for the Web named Web High Level Shading Language (WHLSL, pronounced “whistle”). The language is insprired by HLSL, the dominant shading language for graphics app developers. It extends HLSL for the Web platform to be safe and secure. It’s easy to read and write, and is well-specified using formal techniques.

Background

Over the past few decades, 3D graphics have changed significantly, and the APIs programmers use to write 3D applications have also changed accordingly. Five years ago, state-of-the-art graphics applications would use OpenGL to perform their rendering. However, the past few years have seen a shift in the 3D graphics industry toward newer, lower-level graphics frameworks that better match the behavior of real hardware. In 2014, Apple created the Metal framework, which lets iOS and macOS apps use the full power of the GPU. In 2015, Microsoft created Direct3D 12, a major update to Direct3D which allows for console-level efficiency for rendering and compute. In 2016, the Khronos Group published the Vulkan API, which is primarily used on Android, that offers similar advantages.

Just like how WebGL brought OpenGL to the Web, the Web community is pursuing bringing this type of new, low-level 3D graphics API to the platform. Last year, Apple established the WebGPU Community Group inside the W3C to standardize a new 3D graphics API which provides the benefits of these native APIs, but is also suitable for the Web environment. This new Web API is implementable on top of Metal, Direct3D, and Vulkan. All of the major browser vendors are participating and contributing to the standardization effort.

Each of these modern 3D graphics APIs uses shaders, and WebGPU is no different. Shaders are programs that take advantage of the specialized architecture of GPUs. In particular, GPUs are better than CPUs at heavy parallel numerical processing. To take advantage of both architectures, modern 3D apps use a hybrid design, using both the CPU and the GPU for different tasks. By leveraging the best traits of each, modern graphics APIs provide a powerful framework for developers to create complex, rich, and fast 3D apps. Apps designed for Metal use the Metal Shading Language, apps designed for Direct3D 12 use HLSL, and apps designed for Vulkan use SPIR-V or GLSL.

Language Requirements

Just like its native counterparts, WebGPU needs a shader language. This language needs to meet several requirements that make it well-tailored for the Web platform.

The language needs to be safe. No matter what the application does, the shader must only be able to read or write data from the Web page’s domain. Without this guarantee, a malicious website could run a shader that reads pixels out of other parts of your screen, even from native apps.

The language needs to be well-specified. The language specification has to be explicit about whether every single possible string of characters is a valid program or not. As with all other Web formats, a shading language for the Web must be precisely specified to guarantee interoperability between browsers.

The language also needs to be well-specified so that it can be used as a compilation target. Many rendering teams write shaders in their own custom in-house language, and then cross-compile to whichever language is necessary. For this reason, the language should have a reasonably small set of unambiguous grammar and type checking rules that compiler writers can reference when emitting this language.

This language needs to be translatable to Metal Shading Language, HLSL (or DXIL), and SPIR-V. This is because WebGPU is designed to work on top of Metal, Direct3D 12, and Vulkan, so the shaders need to be able to be represented in a form that each of those APIs can accept.

The language needs to be performant. The entire reason developers want to use the GPU in the first place is for performance. The compiler itself needs to run quickly, and programs produced by the compiler need to run efficiently on real GPUs.

This language needs to evolve with the WebGPU API. WebGPU features such as the binding model and tessellation model interact deeply with the shading language. Though it is feasible to have the language developed independently of the API, having the WebGPU API and shading language in the same forum ensures the goals are shared, and makes development more streamlined.

The language needs to be easy for a developer to read and write. There are two pieces to this: firstly, the language should be familiar to both GPU programmers and CPU programmers. GPU programmers are important clients because they have experience writing shaders. CPU programmers are important because GPUs are increasingly being used for purposes beyond rendering, including machine learning, computer vision, and neural networks. For them, the language should be compatible with familiar programming language concepts and syntax.

The second piece to this is that the language should be human-readable. The culture of the Web is one where anyone can start writing a webpage with just a text editor and a browser. This democratization of content is one of the Web’s greatest strengths. This culture has created a rich ecosystem of tools and inspectors, where tinkerers can investigate how any webpage works simply by View-Source. Having a single canonical human-readable language will greatly aid in community adoption of the WebGPU API.

All the major languages used on the web today are human-readable, with one exception. The WebAssembly Community Group expected that parsing a bytecode would be more performant than parsing a text language. However, that turned out not to be true; Asm.js, which is JavaScript source, is still faster than WebAssembly for many use cases.

Similarly, using a bytecode format such as WebAssembly doesn’t obviate the browser from needing to run optimization passes over the source code. Every major browser runs optimization passes on the bytecode prior to execution. Unfortunately, the desires of simpler compilers never ended up panning out.

There is active debate in the Community Group about whether or not this human-readable language should be the one that’s natively accepted by the API, but the group agrees that the language that shaders are written in should be easily readable and writable.

A New Language? Really?

While there are a number of existing languages, none have been designed with both the Web and modern graphics applications in mind, and none that address the requirements listed above. Before we describe WHLSL, let’s look at some existing languages.

Metal Shading Language is very similar to C++, which means it has all the power of bit-casts and raw pointers. It’s extremely powerful; the same source code can even be compiled for CPUs and GPUs. It’s extremely easy to port existing CPU-side code to Metal Shading Language. Unfortunately, all this power has some downsides. In Metal Shading Language, you could, for example, write a shader that casts a pointer to an integer, adds 17, casts it back to a pointer, and dereferences it. This is a security problem because it means the shader could access any resource that happens to be in the address space of the application, which is contrary to the Web’s security model. Theoretically, it could be possible to specify a dialect of Metal Shading Language that doesn’t have raw pointers, but pointers are so fundamental to the C and C++ languages that the result would be completely unfamiliar. C++ also heavily relies on undefined behavior, so any effort to fully specify each of C++’s numerous features would be unlikely to be successful.

HLSL is the supported language that Direct3D shaders written in. It’s currently the most popular realtime shading language in the world, and is therefore the most familiar language to graphics programmers. There are multiple implementations, but there is no formal specification, making it difficult to create consistent, interoperable implementations. Nonetheless, given HLSL’s ubiquity, it is valuable to adopt its syntax as much as possible in the design of WHLSL.

GLSL is the language used by WebGL, and was adopted by WebGL for the web platform. However, reaching cross-browser interoperability was extremely difficult due to incompatibilities in GLSL compilers. There remains a long tail of security and portability bugs with GLSL still being investigated. Also, GLSL is showing its age. It’s limited in that it doesn’t have pointer-like objects, or the ability to have variable length arrays. Its input and outputs are global variables with hardcoded names.

SPIR-V was designed to be a low-level universal intermediate format for the actual shading languages that developers would use. People do not author SPIR-V; they use a human-readable language, and then convert it into SPIR-V bytecode using a tool.

There are a few challenges with adopting SPIR-V for the web. First, SPIR-V was not written with security as a first principle, and it’s unclear whether it can be modified to satisfy the security requirements of the web. Forking the SPIR-V language means developers would have to recompile their shaders, possibly being forced to rewrite their source code anyway. Additionally, browsers would still be unable to trust incoming bytecode, and would be required to validate programs to make sure they are not doing anything insecure. And since Windows and macOS/iOS don’t support Vulkan, the incoming SPIR-V would still need to be translated/compiled into another language. Weirdly, this would mean on those two platforms, the starting point and the ending point are both human readable, but the bit in between is obfuscated with no benefit.

Second, a significant amount of the SPIR-V specification exists inside separate documents known as “execution environments.” A SPIR-V execution environment currently doesn’t exist for the Web, and without one of these execution environments, many critical pieces of SPIR-V are undefined, such as which of the over 50 optional capabilities are supported.

Third, many graphics applications such as Babylon.js require dynamically modifying shaders at runtime. Using a bytecode format means that these applications would have to include a compiler written in JavaScript that runs in the browser to produce the bytecode from the dynamically created shader. This would significantly increase the bloat of these sites and would lead to worse performance.

Though JavaScript is the canonical language for the Web, its properties make it a poor candidate for a shading language. One of its strengths is its flexibility, but this dynamism leads to many conditionals and divergent control flow, which GPUs are not designed to execute efficiently. It is also garbage-collected, which is a procedure that definitely isn’t well-suited for GPU hardware.

WebAssembly is another familiar possibility, but it also doesn’t map well to the architecture of GPUs. For example, WebAssembly assumes a single dynamically-sized heap, but GPU programs operate with access to multiple dynamically-sized buffers. There isn’t a high-performance way to map between the two models without recompiling.

Therefore, after a fairly exhaustive search for a suitable language, we couldn’t find one which adequately meets the requirements of the project. So, the Community Group is making a new language. Creating a new language is a large task, but we feel there is an opportunity to produce something new that uses modern programming language design principles and fulfills our requirements.

WHLSL

WHLSL is a new shading language that fits the Web platform. It’s being developed by the WebGPU Community Group at the W3C, and the group is working on a specification, a compiler, and a CPU-side interpreter to show correctness.

The language is based on HLSL, but simplifies and extends it. We’d really like existing HLSL shaders to just work as WHLSL shaders. Since WHLSL is a well-specified powerful and expressive shading language, some HLSL shaders will need tweaks to work, but as a result, WHLSL can guarantee safety and other benefits outlined above.

For example, here is an example vertex shader from Microsoft’s DirectX-Graphics-Samples repository. It works as a WHLSL shader without any changes:

VSParticleDrawOut output;
output.pos = g_bufPosVelo[input.id].pos.xyz;
float mag = g_bufPosVelo[input.id].velo.w / 9;
output.color = lerp(float4(1.0f, 0.1f, 0.1f, 1.0f), input.color, mag);
return output;

And here’s the associated pixel shader, which works as a WHLSL shader completely unmodified:

float intensity = 0.5f - length(float2(0.5f, 0.5f) - input.tex);
intensity = clamp(intensity, 0.0f, 0.5f) * 2.0f;
return float4(input.color.xyz, intensity);

Basics

Let’s talk about the language itself.

Just like in HLSL, the primitive data types are bool, int, uint, float, and half. Doubles are not supported because they don’t exist in Metal, and software emulation would be too slow. Bools don’t have a particular bit representation and thus cannot be present in shader inputs/outputs or resources. This same restriction is present in SPIR-V, and we’d like to be able to use OpTypeBool in the generated SPIR-V code. WHLSL also includes smaller integral types of char, uchar, short, and ushort, which are available directly in Metal Shading Language, can be specified in SPIR-V by specifying 16 in OpTypeFloat, and can be emulated in HLSL. Emulation of these types is faster than emulation of doubles because the types are smaller and their bit representation is less complicated.

WHLSL doesn’t provide C-style implicit conversions. We’ve found implicit conversions to be a common source of errors in shaders, and forcing the programmer to be explicit about where the conversions occur eliminates this often frustrating and mysterious class of bugs. This is a similar approach that languages such as Swift have taken. Additionally, a lack of implicit conversions keeps the specification and the compiler simple.

Just like in HLSL, there are vector types and matrix types such as float4 and int3x4. Rather than add a bunch of “x1” single-element vectors and matrices, we opted to keep the standard library simple, since a single-element vector is already representable as a scalar and a single-element-matrix is already representable as a vector. This is consistent with the desire to eliminate implicit conversions, and requiring an explicit conversion between float1 and float is cumbersome and needlessly verbose.

So, the following is a valid snippet of a shader:

int a = 7;
a += 3;
float3 b = float3(float(a) * 5, 6, 7);
float3 c = b.xxy;
float3 d = b * c;

I mentioned earlier that no implicit conversions are allowed, but you may have noticed in the above snippet, 5 is not written as 5.0. This is because literals are represented as a special type that can be unified with other numeric types. When the compiler sees the above code, it knows the multiplication operator requires the arguments to be the same type, and the first argument is clearly a float. So, when the compiler sees float(a) * 5 it says “well, I know the first argument is a float, so that means I must be using the (float, float) overload, so let’s unify the 5 with the second argument, and thus the 5 becomes a float.” This works even when both arguments are literals, because literals have a preferred type. So, 5 * 5 will get the (int, int) overload, 5u * 5u will get the (uint, uint) overload, and 5.0 * 5.0 will get the (float, float) overload.

One difference between WHLSL and C is that WHLSL zero-initializes all uninitialized variables at their declaration site. This prevents non-portable behavior across OSes and drivers, or even worse, reading whatever value happened to be there before your shader started executing. It also means that all constructible types in WHLSL have a zero-value.

Enums

Because they don’t incur any runtime cost and are extremely useful, WHLSL has native support for enums.

enum Weekday {
    Monday,
    Tuesday,
    Wednesday,
    Thursday,
    PizzaDay
}

The underlying type for an enum defaults to int, but you can override the type, e.g. enum Weekday : uint. Similarly, enum values can have an underlying value like Tuesday = 72. Because enums have defined types and values, they can therefore be used in buffers, and they can be casted between their underlying type and the enum type. When you want to refer to a value in code, you qualify it like Weekday.PizzaDay similar to how enum classes work in C++. This means that enum values don’t pollute the global namespace, and values of independent enums won’t collide.

Structs

Structs in WHLSL work similarly to HLSL and C.

struct Foo {
    int x;
    float y;
}

Simply designed, they avoid inheritance, virtual methods, and access control. It’s impossible to have a “private” member of a struct. Because structs don’t have access control, there is no need for structs to have member functions. Free functions can see every member of every struct.

Arrays

Like other shading languages, arrays are value types that are passed and returned from functions by value (aka “copy-in copy-out,” like regular scalars). You make one using the following syntax:

int[3] x;

Just like any variable declaration, this will zero-fill the contents of the array, and is therefore an O(n) operation. We wanted to put the brackets after the type instead of after the variable name for two reasons:

  1. Putting all type information in a single place makes the parser simpler (avoiding the clockwise/spiral rule)
  2. Avoiding ambiguity when multiple variables are declared in a single statement (e.g. int[10] x, y;)

One critical way we ensure safety of the language is performing bounds checking on every array access. There are a number of ways we make this potentially expensive operation efficient. Array indexes are uint, which reduce the check to a single comparison. Arrays are not sparsely implemented, and contain a length member which is available at compile-time, making access have near-zero cost.

Whereas arrays are value types, WHLSL achieves reference semantics using two other types: safe pointers and array references.

Safe Pointers

The first is the safe pointer. Some form of reference semantics, which is the behavior pointers allow for, are used in almost every CPU-side programming language. Including pointers in WHLSL will make it easier for developers to migrate existing CPU-side code to the GPU, thereby allowing for easy porting of things like machine learning, computer vision, and signal processing applications.

To satisfy the safety requirement, WHLSL uses safe pointers, which are guaranteed to either point to something valid or be null. Like C, you can create a pointer to an lvalue by using the & operator and can dereference one by using the * operator. Unlike C, you can’t index through a pointer as-if it were an array. You can’t cast it to and from a scalar value, and it doesn’t have a specific bit pattern representation. Therefore, it can’t exist in a buffer or as a shader input/output.

Just like in OpenCL and Metal Shading Language, the GPU has different heaps, or address spaces that values can exist within. WHLSL has 4 different heaps: device, constant, threadgroup, and thread. All reference types must be tagged with the address space they point into.

The device address space corresponds to the majority of memory on the device. This memory is readable and writable, and corresponds to Unordered Access Views in Direct3D and device memory in Metal Shading Language. The constant address space corresponds to a read-only region of memory, typically optimized for data being broadcast to every thread. As such, writing to an lvalue that exists in the constant address space is a compile error. Lastly, the threadgroup address space corresponds to a readable and writable region of memory that is shared between each thread in a threadgroup. It can only be used in compute shaders.

By default, values exist within the thread address space:

int i = 4;
thread int* j = &i;
*j = 7;
// i is now 7

Because all variables are zero-initialized, pointers are null-initialized. Therefore, the following is valid:

thread int* i;

Trying to dereference this pointer will cause either trapping or clamping, as described later.

Array References

Array references are similar to pointers, but they can be used with the subscript operator to access multiple elements in the array reference. Whereas arrays’ lengths are known at compile time and must be stated inside the type declaration, an array reference’s length is only known at runtime. Just like pointers, they must be associated with an address space, and they may be nullptr. Just like arrays, they are indexed using uints for single-comparison bounds checks, and they can’t be sparse.

They correspond to the OpTypeRuntimeArray type in SPIR-V and one of Buffer, RWBuffer, StructuredBuffer, or RWStructuredBuffer in HLSL. In Metal, it is represented as a tuple of a pointer and a length. Just like array accesses, all operations are checked against the array reference’s length. Buffers are passed into the entry points from the API via array references or pointers.

You can make an array reference from an lvalue by using the @ operator:

int i = 4;
thread int[] j = @i;
j[0] = 7;
// i is 7
// j.length is 1

Just as you might expect, using @ on pointer j creates an array reference that points to the same thing as j:

int i = 4;
thread int* j = &i;
thread int[] k = @j;
k[0] = 7;
// i is 7
// k.length is 1

Using @ on an array makes the array reference point to that array:

int[3] i = int[3](4, 5, 6);
thread int[] j = @i;
j[1] = 7;
// i[1] is 7
// j.length is 3

Functions

Functions look very similar to their C counterparts. For example, here is a function in the standard library:

float4 lit(float n_dot_l, float n_dot_h, float m) {
    float ambient = 1;
    float diffuse = max(0, n_dot_l);
    float specular = n_dot_l < 0 || n_dot_h < 0 ? 0 : n_dot_h * m;
    float4 result;
    result.x = ambient;
    result.y = diffuse;
    result.z = specular;
    result.w = 1;
    return result;
}

This example shows how similar WHLSL functions are to C: function declarations and calls (e.g. to max()) have similar syntax, arguments and parameters are matched up pairwise in order, and ternary expressions are supported.

Operators and Operator Overloading

However, something else is going on here, too. When the compiler sees n_dot_h * m, it doesn’t intrinsically know how to perform that multiplication. Instead, the compiler will turn that into a call to operator*(). Then, the specific operator*() is chosen via the standard function overload resolution algorithm. This is important because it means you can write your own operator*() function, and teach WHLSL how to multiply your own types.

This even works for operations like ++. Though pre- and post-increment have different behaviors, they both get overloaded to the same function: operator++(). Here’s an example from the standard library:

int operator++(int value) {
    return value + 1;
}

This operator will be called for both pre-increment and post-increment, and the compiler is smart enough to do the right thing with the result. This solves the problem that C++ runs into where those operators are distinct, and are differentiated using an extra dummy int argument. For post-increment, the compiler will emit code to save the value to an anonymous variable, call operator++(), assign the result, and use the saved value for further processing.

Operator overloading is used all over the language. It’s how vector and matrix multiplication is implemented. It’s how arrays are indexed. It’s how swizzle operators work. Operator overloading provides power and simplicity; the core language doesn’t have to know about each of these operations directly because they are implemented by overloaded operators.

Generated Properties

WHLSL doesn’t just stop at operator overloading, though. An earlier example included b.xxy where b is a float3. This is an expression that means “make a 3-element vector where the first two elements have the same value as b.x and the third element has the same value as b.y.” So it’s sort of like a member of the vector, except it isn’t actually associated with any storage; instead, it’s computed during the time it’s accessed. These “swizzle operators” are present in every realtime shading language, and WHLSL is no exception. The way they’re supported is by marking them as a generated property, like in Swift.

Getters

The standard library includes many functions of the following form:

float3 operator.xxy(float3 v) {
    float3 result;
    result.x = v.x;
    result.y = v.x;
    result.z = v.y;
    return result;
}

When the compiler sees a property access to a member that doesn’t exist, it can call the operator passing the object as the first argument. Colloquially, we call this a getter.

Setters

The same approach even works for setters:

float4 operator.xyz=(float4 v, float3 c) {
    float4 result = v;
    result.x = c.x;
    result.y = c.y;
    result.z = c.z;
    return result;
}

Using setters is very natural:

float4 a = float4(1, 2, 3, 4);
a.xyz = float3(7, 8, 9);

The implementation of the setter creates a copy of the object with the new data. When the compiler sees an assignment to a generated property, it calls the setter and assigns the result to the original variable.

Anders

A generalization of getters and setters is the ander, which works with pointers. It exists as a performance optimization, so setters don’t have to create a copy of the object. Here’s an example:

thread float* operator.r(thread Foo* value) {
    return &value->x;
}

Anders are more powerful than either getters or setters, because the compiler can use anders to implement either reads or assignments. When reading from a generated property via an ander, the compiler invokes the ander and then dereferences the result. When writing to it, the compiler invokes the ander, dereferences the result, and assigns to the result of that. Any user-defined type can have any combination of getters, setters, anders, and indexers; if the same type has an ander and either a getter or a setter, the compiler will prefer using the ander.

Indexers

But what about matrices? In most realtime shading languages, matrices aren’t accessed with members corresponding to their columns or rows. Instead, they are accessed using array syntax, e.g. myMatrix[3][1]. Vector types also usually have this kind of syntax. So how does this work? More operator overloading!

float operator[](float2 v, uint index) {
    switch (index) {
        case 0:
            return v.x;
        case 1:
            return v.y;
        default:
            /* trap or clamp, more on this below */
    }
}

float2 operator[]=(float2 v, uint index, float a) {
    switch (index) {
        case 0:
            v.x = a;
            break;
        case 1:
            v.y = a;
            break;
        default:
            /* trap or clamp, more on this below */
    }
    return v;
}

As you can see, indexing uses operators too, and can therefore be overloaded. Vectors get these “indexers” too, so myVector.x and myVector[0] are synonyms for each other.

Standard Library

We designed the standard library based on the Microsoft Docs describing the HLSL standard library. The WHLSL standard library mostly includes math operations, which work both on scalar values and element-wise on vectors and matrices. All the standard operators you would expect are defined, including logical and bitwise operations, like operator*() and operator<<(). All the swizzle operators, getters, and setters are defined for vectors and matrices, where applicable.

One of the design principles of WHLSL was to keep the language itself small so as much as possible could be defined in the standard library. Of course, not all the functions in the standard library can be expressed in WHLSL (like bool operator*(float, float)) but almost everything else is implemented in WHLSL. For example, this function is part of the standard library:

float smoothstep(float edge0, float edge1, float x) {
    float t = clamp((x - edge0) / (edge1 - edge0), 0, 1);
    return t * t * (3 - 2 * t);
}

Because the standard library is designed to match HLSL as much as possible, most of the functions in it are already present in HLSL directly. So a compilation of WHLSL’s standard library to HLSL would choose to omit these functions and instead use the built-in versions. This will happen, for instance, for all the vector/matrix indexers — the GPU should never actually see the code above; the code generation step in the compiler should use the intrinsic instead. However, different shading languages have different intrinsics, so every function is defined to allow for correctness testing. Similarly, WHLSL includes a CPU-side interpreter, which uses the WHLSL implementations of these functions when executing WHLSL programs. This is true for every WHLSL function including the texture sampling functions.

Not every function in HLSL’s standard library is present in WHLSL. For example, HLSL supports printf(). However, implementing such a function in Metal Shading Language or SPIR-V would be quite difficult. We included as many functions from the HLSL standard library as is reasonable in the Web environment.

Variable Lifetime

But if there are pointers in the language, how should we deal with use-after-free problems? For example, consider the following snippet:

thread int* foo() {
    int a;
    return &a;
}
…
int b = *foo();

In languages like C, this code has undefined behavior. So, one solution is for WHLSL to just forbid this kind of structure and throw a compilation error when it sees something like this. However, this would require tracking the values that every pointer could possibly point to, which is a difficult analysis in the presence of loops and function calls. Instead, WHLSL makes every variable behave as if it has global lifetime.

This means that this WHLSL snippet is completely valid and well-defined, for two reasons:

  1. Declaring a without an initializer will zero-fill it. Therefore, the value of a is well-defined. This zero-filling will occur each time foo() is called.

  2. All variables have global lifetime (similar to C’s static keyword). Therefore, a never goes out of scope.

This global lifetime is only possible because recursion is disallowed (which is common for shading languages), which means there will not be any reentrancy problems. Similarly, shaders cannot allocate or free memory, so the compiler knows at compile-time every piece of memory that a shader could possibly access.

So, for example:

thread int* foo() {
    int a;
    return &a;
}
…
thread int* x = foo();
*x = 7;
thread int* y = foo();
// *x equals 0, because the variable got zero-filled again
*y = 8;
// *x equals 8, because x and y point to the same variable

Most variables won’t need to truly be made global, so there isn’t a big impact on performance. If the compiler can prove that it is unobservable whether or not a particular variable actually has global lifetime, the compiler is free to keep the variable local. Because the pattern of returning pointers to locals is discouraged in other languages (indeed, many other shading languages don’t even have pointers), examples like this one will be relatively rare.

Stages of Compilation

WHLSL doesn’t make use of a preprocessor as other languages do. In other languages, the preprocessor’s primary purpose is to include multiple source files together. On the web, however, there is no direct file access, and usually the entire shader is presented in one downloaded resource. In many shading languages, the preprocessor is used to conditionally enable rendering features inside a large ubershader, but WHLSL allows for this use case by using specialization constants instead. Moreover, the many variants of preprocessors are incompatible in subtle ways, so the benefit of a preprocessor for WHLSL doesn’t outweigh the complexity of creating a specification for one.

WHLSL is designed for a two-stage compilation. In our research, we’ve found that many 3D engines want to compile a large corpus of shaders, and each compilation includes large libraries of functions that are duplicated between the different compilations. Instead of compiling these support functions multiple times, a better solution is to compile the entire library once, and then allow a second stage to select which entry points from the library should be used together.

This two-stage compilation means that as much of the compilation should be done in the first pass, so it isn’t run multiple times for families of shaders. This is the reason entry points in WHLSL are marked as either vertex, fragment, or compute. Letting the first stage of the compilation know which functions are entry points of which type lets more of the compilation occur in the first stage rather than the second stage.

This second compilation stage also provides a convenient place to specify specialization constants. Recall that WHLSL doesn’t have a preprocessor, which is the traditional way for features to be enabled and disabled in HLSL. Engines often tailor a single shader to a particular situation by enabling a rendering effect or switching out a BRDF with the flip of a switch. The technique of including every rendering option in a single shader, and specializing the single shader based on which effects to enable, is so common it has a name: ubershaders. Instead of preprocessor macros, WHLSL programmers can use specialization constants, which work the same way as SPIR-V’s specialization constants. From the language’s point of view, they are just scalar constants. However, the values for these constants are supplied during this second compilation stage, making it super easy to configure the program at runtime.

Because a single WHLSL program can include multiple shaders, the inputs and outputs to the shader aren’t represented by global variables in the way that other shading languages do it. Instead, the inputs and the outputs for a particular shader are associated with that shader itself. Inputs are represented as arguments to the shader’s entry point and outputs are represented as the return value of the entry point.

The following shows how to describe a compute shader entry point:

compute void ComputeKernel(device uint[] b : register(u0)) {
   …
}

Safety

WHLSL is a safe language. This means that it is impossible to access information outside of a website’s origin. One of the ways WHLSL achieves this is by eliminating undefined behavior, as described above regarding uniformity.

Another way WHLSL achieves safety is by performing bounds checks of array/pointer accesses. There are three ways these bounds checks may occur:

  1. Trapping. When a trap occurs in a program, the shader stage immediately exits, filling in 0s for all of the shader stage’s outputs. The draw call continues, and the next stage of the graphics pipeline gets run.
    Because trapping introduces new control flow, it has an effect on the uniformity of the program. Traps are issued inside bounds checks, which means they are necessarily present in non-uniform control flow. It may be okay for some programs that don’t use uniformity for anything, but in general this makes traps difficult to use.
  2. Clamping. Array index operations can clamp the index to the size of the array. This doesn’t involve new control flow, so it doesn’t have any affect on uniformity. It is even possible to “clamp” a pointer access or a zero-length array access by disregarding writes and returning 0s for reads. This is possible because the set of things you can do with a pointer in WHLSL is limited, so we can simply make each of those operations do some well-defined thing with a “clamped” pointer.
  3. Hardware and Driver Support. Some hardware and drivers already includes a mode where out-of-bounds accesses can’t happen. With this method, the mechanism by which the hardware forbids out-of-bounds accesses is implementation-defined. One example is the ARB_robustness OpenGL Extension. Unfortunately, WHLSL should be runnable on almost all modern hardware, and simply not enough APIs/devices support these kinds of modes.

Whichever method the compiler uses, it should not affect the uniformity of the shader; in other words, it can’t turn an otherwise valid program into an invalid one.

In order to determine the best behavior for bounds checks, we ran some performance experiments. We took some of the kernels used in the Metal Performance Shaders framework and made two new versions: one that uses clamping, and one that uses trapping. The kernels we picked were ones that do lots of array accesses: for example, multiplying large matrices. We ran this benchmark on a variety of devices at varying data sizes. We made sure that none of the traps were actually hit and none of the clamps actually had any effect, so we can be sure we were measuring the common case of a correctly-written program.

We expected trapping to be generally faster, because redundant traps can be eliminated by the downstream compiler. However, we discovered that there wasn’t one clear winner. On some devices, trapping was significantly faster than clamping, and on other devices, clamping was significantly faster than trapping. These results show that the compiler should be able to choose which method is best for the particular device it’s being run on, rather than being forced to always choose one method.

Chart of iPhone 6 vs iPhone X runtime scores

Shader Signatures

WHLSL supports a language feature of HLSL called “semantics.” They are used to identify variables between shader stages and from the WebGPU API. There are four types of semantics:

  • Built-in variables, e.g. uint vertexID : SV_VertexID
  • Specialization constants, e.g. uint numlights : specialized
  • Stage in/out semantics, e.g. float2 coordinate : attribute(0)
  • Resource semantics, e.g. device float[] coordinate : register(u0)

As described above, WHLSL programs accept their inputs and outputs in the form of function parameters, not global variables.

However, shaders often have multiple outputs. The most common example of this is the vertex shader passing multiple output values to the interpolator to be fed as inputs into the fragment shader.

In order to accommodate this, the return value of a shader can be a struct, and the individual fields are treated independently. In fact, this works recursively — the struct can contain another struct, and its members are also treated independently. Nested structs are flattened, and all the fields which aren’t structs are gathered and treated as shader outputs.

Shader parameters work the same way. An individual parameter can be a shader input, or it can be a struct with a collection of shader inputs. Structs can also contain other structs. Variables inside these structs are treated independently, as if they were additional parameters to the shader.

After all these structs have been flattened into a set of inputs and a set of outputs, each item in the sets must have a semantic. Each built-in variable must have a particular type and must only be used in a particular shader stage. Specialization constants must only have simple scalar types.

Stage in/out variables have the attribute semantic rather than the traditional HLSL semantics because many shaders pass around data that don’t match the canned semantics HLSL provides. In HLSL, it’s common to see generic data packed into the COLOR semantic, because COLOR is a float4 and the data fits inside a float4. Instead, the approach both SPIR-V and Metal Shading Language (via [[user(n)]]) take is to assign an identifier to each stage in/out variable, and use the assignments to match the variables between shader stages.

Resource semantics should be familiar to HLSL programmers. WHLSL includes both resource semantics and address spaces, but both of these have different purposes. The address space of the variable is used to determine which cache and memory hierarchy it should be accessed within. The address space is necessary because it persists even through pointer operations; a device pointer can’t be set to point to a thread variable. In WHLSL, the resource semantic is only used to identify a variable from the WebGPU API. However, for consistency with HLSL, the resource semantic must “match” the address space of the variable it’s being put on. For example, you can’t put register(s0) on a texture. You can’t put register(u0) on a constant resource. Arrays in WHLSL don’t have address space (because they are value types, not reference types) so if an array appears as a shader argument, it is treated as if it was a device resource for the purposes of matching semantics.

Just like Direct3D, WebGPU has a two-level binding model. Resource descriptors are aggregated into sets, and sets can be switched out in the WebGPU API. WHLSL matches HLSL by modeling this by an optional space parameter inside resource semantics: register(u0, space1).

“Logical Mode” restrictions

WHLSL is designed with the requirement that it is compilable to Metal Shading Language, SPIR-V, and HLSL (or DXIL). SPIR-V has many different operating modes, targeted by different embedding APIs. Specifically, we’re interested in the flavor of SPIR-V that Vulkan targets.

This flavor of SPIR-V is a flavor of SPIR-V called Logical Addressing Mode. In SPIR-V Logical Mode, variables cannot have pointer type. Similarly, a pointer cannot be used in a Phi operation. The result of this is that each pointer must point to exactly one thing for all time; a pointer is simply a name for a value.

Because WHLSL needs to be compilable to SPIR-V, WHLSL must not be more expressive than SPIR-V. Therefore, WHLSL has some restrictions to make it expressible in SPIR-V Logical Mode. These restrictions aren’t surfaced as an optional mode for WHLSL; instead, they’re part of the language itself. Eventually, we hope these restrictions can be lifted in a future version of the language, but until then, the language is restricted.

These restrictions are:

  • Pointers and array references must not occur inside device, constant, or threadgroup memory
  • Pointers and array references must not occur inside arrays or array references
  • Pointers and array references must not be assigned outside of their initializer (in their declaration)
  • Functions that return pointers or array references must only have a single return point
  • Ternary expressions must not result in pointers

With these restrictions, the compiler knows exactly what every pointer points to.

But not so fast! Recall from above that thread variables have global lifetime, which means they behave as-if they were declared at the beginning of the entry point. What if the runtime gathered all these local variables together, sorted by type, and aggregated all the variables with the same type into arrays? Then, a pointer could simply be an offset into the appropriate array. A pointer can’t be recast to point to a different type in WHLSL, which means the appropriate array is determined statically by the compiler. Therefore, thread pointers don’t need to abide by the restrictions above. But, this technique doesn’t work for the pointers in the other address spaces; it only works for thread pointers.

Resources

WHLSL supports textures, samplers, and array references for buffers. Just like in HLSL, texture types in WHLSL look like Texture2D<float4>. The presence of these angle brackets don’t imply templates or generics; the language doesn’t have facilities for those (for simplicity). The only types that are allowed to have them are a finite set of built-in types. This design is a middle ground between allowing these types, which are present in HLSL, but also allowing further development of the language in a way that the Community Group can use the angle bracket characters.

Depth textures are distinct from non-depth-textures because they are different types in Metal Shading Language, so the compiler needs to know which one to emit when it’s emitting Metal Shading Language. Because WHLSL doesn’t support member functions, texture sampling isn’t done like texture.Sample(…); instead, it’s done with free functions like Sample(texture, …).

Samplers are not specialized; there is one sampler type for all use cases. You can use this sampler for both depth textures and non-depth textures. Depth textures support things like comparison operations in the sampler. If the sampler is configured to include a depth comparison and it’s used with a non-depth texture, the depth operation is ignored.

The WebGPU API will automatically emit some resource barriers at particular places, which means the API needs to know which resources are used in a shader. Therefore, the “bindless” model of resources can’t be used. This means that all resources are listed as explicit inputs to a shader. Similarly, the API wants to know which resources are used for reading and which are used for writing; the compiler knows this statically by inspecting the program. There is no language-level support for “const” or a distinction between StructuredBuffer and RWStructuredBuffer because that information is already present in the program.

Current Work

The WebGPU community group is working on a formal language specification written with OTT that describes WHLSL with the same level of rigor that other Web languages employ. We’re also working on a compiler that can produce Metal Shading Language, SPIR-V, and HLSL. In addition, the compiler includes a CPU-side interpreter to show correctness of an implementation. Please try it out!

Future Directions

WHLSL is still nascent, and there is still a long way to go before the design of the language is complete. We would love to hear from you about your desires, concerns, and use cases! Please feel free to file issues in our GitHub repository about your ideas and thoughts!

For the first proposal, we wanted to satisfy the constraints outlined at the beginning of this post, yet provide ample opportunity for expanding the language. One natural evolution of the language could add facilities for abstractions of types, like protocols or interfaces. WHLSL includes simple structs with no access control or inheritance. Other shading languages like Slang model type abstractions as a set of methods that must be present inside the struct. However, Slang runs into a problem where it is impossible to make an existing type adhere to a new interface. Once the struct is defined, you can’t add new methods to it; the curly brace has closed the struct forever. This problem is solved with extensions, similar to Objective-C or Swift, which can retroactively add methods into a struct after the struct has been defined. Java solved this problem by encouraging authors to add new classes, called adapters, which only exist to implement an interface, and plumb each call through to the implementing type.

The WHLSL approach is much simpler; by using free functions instead of struct methods, we can use a system like Haskell’s type classes. Here, a type class defines a set of arbitrary functions that must exist, and a type adheres to the type class by implementing them. A solution like this could potentially be added to the language in the future.

Wrapping Up

This describes a new shading language named WHLSL that the W3C’s WebGPU Community Group owns. The goals of the language are satisfied by its familiar, HLSL-based syntax, safety guarantees, and simple, extensible design. As such, it represents the best-supported way to write shaders to be used in the WebGPU API. However, the WebGPU Community Group is unsure whether or not WHLSL programs should be supplied to the WebGPU API directly, or whether they should be compiled to an intermediate form before delivery to the API. Either way, WebGPU programmers should be writing in WHLSL because it fits best with the API.

Please get involved! We’re doing this work on the WebGPU GitHub project. We’ve been working on a formal specification for the language, a reference compiler to emit Metal Shading Language and SPIR-V, and a CPU-side interpreter to validate correctness. We welcome everyone to try it out, and let us know how it goes!

For more information, you can contact me at mmaxfield@apple.com or @Litherum, or you can contact our evangelist, Jonathan Davis.