Shipping C++ WebAssembly Packages on NPM

Sat Jun 22, 2024

In the previous post we saw how to build, ship and debug a C++ WebAssembly (WASM) application, where we had the compute and rendering parts of our app written in C++ and compiled to WASM, while the other frontend parts of the app were written in TypeScript and called into our C++ code. This app architecture is similar to Figma, (I think) Inigo Quilez’s Project Neo at Adobe, and what I work on at Luminary Cloud. By pushing the heavy rendering and compute work into WASM we gain serious performance improvements, while allowing us to still leverage all the great work on frontend frameworks, HTML, etc. for UI development, accessibility, etc.

Previously we just had CMake copy the compiled WASM into the web app’s source tree directly; however, that doesn’t make our WASM re-usable or portable across projects. Now we’re going to look at how to publish our WASM as an NPM package and use it from a TypeScript app by installing it! Shipping our WASM code through an NPM package makes it easy for us and others to integrate it into a frontend application. I hope this post will provide some insights and inspire you to write and ship your own WASM NPM packages to accelerate compute and graphics on the web! Let me know about the cool stuff you’re building by email (), Twitter, Mastodon, or Linkedin.


Building, Shipping and Debugging a C++ WebAssembly App

Tue Jun 11, 2024

WebAssembly (WASM) is an exciting, and now practically ubiquitous, browser technology that allows for compiling code from a number of languages to run at near-native speed in the browser. Native languages, such as C, C++, Rust, and Zig can be compiled to WASM to accelerate computationally intensive tasks in web applications or to port entire native applications to the browser. Even garbage collected languages such as C# and Go can be compiled to WASM to run in the browser. Support for WASM can be assumed in all relatively modern browsers, and more applications have begun leveraging it, from early adopter Figma to Google Sheets.

In this post, we’ll look at how to build, ship, and debug a C++ WASM application that is integrated into a TypeScript or JavaScript frontend application. This is a similar model to Figma, where the UI components are written in React (or your framework of choice), while all heavy computation and non-UI rendering work is owned by a WASM module written in a native language for performance. Inigo Quilez’s Project Neo at Adobe appears to have a similar design, based on digging through a performance trace in Chrome, and what I’ve been working on at Luminary Cloud follows a similar architecture. To do this, we need to integrate our WASM module into frontend bundlers like Webpack, maybe through an npm module, and think about API designs for efficiently coupling TypeScript and C++ WASM modules (though we won’t have time for npm modules and API discussion in this post).


From 0 to glTF with WebGPU: Basic Materials and Textures

Sun Apr 28, 2024

Figure 1: The happy duck we’ll be able to render at the end of this post

Now that we can load up complex scene hierarchies from glTF files and render them correctly, let’s start getting some more interesting colors on screen! glTF defines a physically based BRDF, with support for metallic and roughness properties, along with normal, emission and occlusion texture maps. There are a number of extensions on top of this basic material model (the KHR_materials_* extensions) that add even more advanced material definitions.

We’ll keep it simple to start. Today we’re going to take the first step of loading the base glTF material parameters and textures from the glB file and passing them to our shader. In the shader we’ll color the object by its base color properties, without applying any lighting or material model yet.


GPU Compute in the Browser at the Speed of Native: WebGPU Marching Cubes

Mon Apr 22, 2024

WebGPU is a powerful GPU-API for the web, providing support for advanced low-overhead rendering pipelines and GPU compute pipelines. WebGPU’s support for GPU compute shaders and storage buffers are a key distinction from WebGL, which lacks these features, and make it possible to bring powerful GPU-parallel applications to run entirely in the browser. These applications can range from GPGPU (e.g., simulations, data processing/analysis, machine learning, etc.) to GPU compute driven rendering pipelines, and applications across the spectrum.

In this post, we’ll evaluate WebGPU compute performance against native Vulkan by implementing the classic Marching Cubes algorithm in WebGPU. Marching Cubes is a nearly embarrassingly parallel algorithm, with two global reduction steps that must take place to synchronize work items and thread output locations. This makes it a great first GPU-parallel algorithm to try out on a new platform, as it has enough complexity to stress the API in a few different ways beyond simple parallel kernel dispatches but isn’t so complex as to take an substantial amount of time to implement or be bottlenecked by CPU performance.


From 0 to glTF with WebGPU: Rendering the Full glTF Scene

Sat Jun 24, 2023

Figure 1: In this post, we’ll look at how to fix our terribly broken 2 Cylinder Engine. Left: A buggy render of 2CylinderEngine.glb achieved when ignoring the glTF node transformations. Right: The correct rendering with meshes positioned based on the hierarchy of transforms specified in the glTF node tree.

Loading and drawing our first mesh from a glTF file was quite a bit of work in the previous post, but with this core piece in place we can start adding a lot more functionality to our renderer pretty quickly. If you tried loading up glTF files into the renderer from the previous post, you may have noticed that they didn’t look how you expected. This is because glTF files often contain many meshes that make up different parts of the scene geometry, most of which will be missing since we only loaded the first mesh last time. If we just add a simple loop through the meshes to load and draw them all we’ll frequently end up with a scene like the broken engine on the left in the image above. This is because the meshes are reference and transformed by the glTF node hierarchy, and we need to load and handle these nested transformations to render the correct scene shown on the right. The test model we’ll be using for this post is the 2CylinderEngine from the Khronos glTF samples repo, which has nested transformations in its node hierarchy that make it a great test case. So grab 2CylinderEngine.glb and let’s get started!


From 0 to glTF with WebGPU: Rendering the First glTF Mesh

Tue May 16, 2023

Now that we’ve seen how to draw a triangle in the first post and hook up camera controls so we can look around in the second post, we’re at the point where the avocado really hits the screen and we can start drawing our first glTF primitives! I say the avocado hits the screen because that’s the glTF test model we’ll be using. You can grab it from the Khronos glTF samples repo. glTF files come in two flavors (minus other extension specific versions), a standard “.gltf” version that stores the JSON header in one file and binary data and textures in separate files, and a “.glb” version, that combines the JSON header and all binary or texture data into a single file. We’ll be loading .glb files in this series to simplify how many files we have to deal with to get a model into the renderer, so grab the glTF-Binary Avocado.glb and let’s get started!

Figure 1: It takes quite a bit to get Avocado.glb on the screen, but this beautiful image of our expected final (and delicious) result should be enough motivation to keep us going!


From 0 to glTF with WebGPU: Bind Groups

Tue Apr 11, 2023

In this second post of the series we’ll learn about Bind Groups, which let us pass buffers and textures to our shaders. When writing a renderer, we typically have inputs which do not make sense as vertex attributes (e.g., transform matrices, material parameters), or simply cannot be passed as vertex attributes (e.g., textures). Such parameters are instead passed as uniforms in GLSL terms, or root parameters in HLSL terms. The application then associates the desired buffers and textures with the parameters in the shader. In WebGPU, the association of data to parameters is made using Bind Groups. In this post, we’ll use Bind Groups to pass a uniform buffer containing a view transform to our vertex shader, allowing us to add camera controls to our triangle from the previous post. If you haven’t read the first post in this series I recommend reading that first, as we’ll continue directly off the code written there.


From 0 to glTF with WebGPU: The First Triangle

Mon Apr 10, 2023

WebGPU is a modern graphics API for the web, in development by the major browser vendors. When compared to WebGL, WebGPU provides more direct control over the GPU to allow applications to leverage the hardware more efficiently, similar to Vulkan and DirectX 12. WebGPU also exposes additional GPU capabilities not available in WebGL, such as compute shaders and storage buffers, enabling powerful GPU compute applications to run on the web. As with the switch from OpenGL to Vulkan, WebGPU exposes more complexity to the user than WebGL, though the API strikes a good balance between complexity and usability, and overall is quite nice to work with. In this series, we’ll learn the key aspects of WebGPU from the ground up, with the goal of going from zero to a basic glTF model renderer. This post marks our initial step on this journey, where we’ll setup a WebGPU context and get a triangle on the screen.


A Dive into Ray Tracing Performance on the Apple M1

Sun Dec 20, 2020

The Apple M1 available in the MacBook Air, MacBook Pro 13", and Mac Mini has been the focus of a ton of benchmarking writeups and blog posts about the new chip. The performance overall, and especially performance/watt, that Apple has achieved with the chip is very impressive. As a ray tracing person, what caught my eye the most was the performance AnandTech reported in their CineBench benchmarks. These scores were 1.6x higher than I got on my old Haswell desktop and 2x higher than my new Tiger Lake laptop! I had also been interested in trying out the new ray tracing API for Metal that was announced at WWDC this year, which bears some resemblance to the DirectX, Vulkan, and OptiX GPU ray tracing APIs. So, I decided to pick up a Mac Mini to do some testing on my own interactive path tracing project, ChameleonRT, and to get it running on the new Metal ray tracing API. In this post, we’ll take a look at the new Metal ray tracing API to see how it lines up with DirectX, Vulkan, OptiX and Embree, then we’ll make some fair (and some extremely unfair) ray tracing performance comparisons against the M1.


Hardware Accelerated Video Encoding on the Raspberry Pi 4 on Ubuntu 20.04 64-bit

Sun Nov 15, 2020

I recently picked up a Raspberry Pi 4 8GB model to use for some lightweight server tasks on my home network. After setting up Pi-Hole, OpenVPN, Plex, and Samba, I got curious about using it to re-encode some videos I had. The videos are on an external drive being monitored by Plex and shared on the network by Samba, and some are quite large since they’re at a (likely unnecessarily) high bitrate. Trimming them down would help save a bit of space, and gives me an excuse to play around with Python, FFmpeg, and the Pi’s hardware accelerated video encoder. In this post, I’ll cover how to get FFmpeg setup to use the Pi 4’s video encoding hardware on a 64-bit OS and the little encoding manager/dashboard, FBED, that I put together to monitor the progress of the encoding tasks.