WebGPU: Browser GPU Computing Tutorial
WebGPU is the newest open standard that brings low‑level, high‑performance GPU access directly to the browser. Unlike WebGL, which is primarily a graphics API, WebGPU treats the GPU as a general‑purpose compute engine, letting you run data‑parallel workloads without ever leaving the comfort of JavaScript. In this tutorial you’ll learn how to set up a WebGPU context, write a simple compute shader, and explore real‑world scenarios such as image filtering and physics simulation.
Getting Started with WebGPU
Before you write any code, make sure your browser supports WebGPU. As of early 2026, Chrome (≥113), Edge (≥113), and Safari (≥16.5) ship with native support, while Firefox offers an experimental flag. Open chrome://flags and enable “WebGPU” if you’re on a version that still requires it.
WebGPU works through two main concepts: GPUDevice, which represents a connection to the GPU, and GPUQueue, which schedules commands. The API is promise‑based, so you’ll spend a lot of time awaiting asynchronous calls.
Creating a GPUDevice
async function initWebGPU() {
if (!navigator.gpu) {
throw new Error('WebGPU not supported on this browser');
}
const adapter = await navigator.gpu.requestAdapter();
const device = await adapter.requestDevice();
return device;
}
Notice the use of await – the browser may need to negotiate with the OS to allocate a GPU adapter. Once you have a device, you can start creating buffers, pipelines, and command encoders.
Setting Up the Development Environment
For a smooth experience, use a modern code editor with JavaScript/TypeScript IntelliSense, such as VS Code. Install the npm package @webgpu/types to get type definitions if you prefer TypeScript.
- Initialize a new project:
npm init -y - Install a local server (e.g.,
npm i -D live-server) to serve files overhttp://localhost, which is required for GPU access. - Create an
index.htmlthat loads amain.jsmodule.
Here’s a minimal HTML scaffold:
Your First Compute Shader
Compute shaders in WebGPU are written in WGSL (WebGPU Shading Language). WGSL is deliberately simple: it looks like a blend of GLSL and Rust, and it compiles directly to GPU machine code.
We’ll start with a classic “vector addition” example. The shader reads two input buffers, adds each pair of floats, and writes the result to an output buffer.
WGSL Code
[[block]] struct Buffer {
data: array;
};
[[group(0), binding(0)]] var a: Buffer;
[[group(0), binding(1)]] var b: Buffer;
[[group(0), binding(2)]] var result: Buffer;
[[stage(compute), workgroup_size(64)]]
fn main([[builtin(global_invocation_id)]] gid: vec3) {
let i = gid.x;
if (i < arrayLength(&a.data)) {
result.data[i] = a.data[i] + b.data[i];
}
}
The workgroup_size(64) directive tells the GPU to launch 64 threads per workgroup. The global_invocation_id provides a unique index for each thread, which we use to address the buffers.
JavaScript Boilerplate
async function vectorAddDemo() {
const device = await initWebGPU();
// Prepare data
const length = 1024;
const aData = new Float32Array(length).map((_, i) => i);
const bData = new Float32Array(length).map((_, i) => length - i);
// Create GPU buffers
const aBuffer = device.createBuffer({
size: aData.byteLength,
usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_DST,
mappedAtCreation: true
});
new Float32Array(aBuffer.getMappedRange()).set(aData);
aBuffer.unmap();
const bBuffer = device.createBuffer({
size: bData.byteLength,
usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_DST,
mappedAtCreation: true
});
new Float32Array(bBuffer.getMappedRange()).set(bData);
bBuffer.unmap();
const resultBuffer = device.createBuffer({
size: aData.byteLength,
usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_SRC
});
// Compile WGSL
const shaderModule = device.createShaderModule({
code: `...WGSL FROM ABOVE...`
});
// Pipeline layout
const bindGroupLayout = device.createBindGroupLayout({
entries: [
{ binding: 0, visibility: GPUShaderStage.COMPUTE, buffer: { type: "read-only-storage" } },
{ binding: 1, visibility: GPUShaderStage.COMPUTE, buffer: { type: "read-only-storage" } },
{ binding: 2, visibility: GPUShaderStage.COMPUTE, buffer: { type: "storage" } }
]
});
const pipeline = device.createComputePipeline({
layout: device.createPipelineLayout({ bindGroupLayouts: [bindGroupLayout] }),
compute: { module: shaderModule, entryPoint: "main" }
});
const bindGroup = device.createBindGroup({
layout: bindGroupLayout,
entries: [
{ binding: 0, resource: { buffer: aBuffer } },
{ binding: 1, resource: { buffer: bBuffer } },
{ binding: 2, resource: { buffer: resultBuffer } }
]
});
// Encode commands
const commandEncoder = device.createCommandEncoder();
const passEncoder = commandEncoder.beginComputePass();
passEncoder.setPipeline(pipeline);
passEncoder.setBindGroup(0, bindGroup);
passEncoder.dispatchWorkgroups(Math.ceil(length / 64));
passEncoder.end();
// Copy result to a readable buffer
const readBuffer = device.createBuffer({
size: aData.byteLength,
usage: GPUBufferUsage.COPY_DST | GPUBufferUsage.MAP_READ
});
commandEncoder.copyBufferToBuffer(resultBuffer, 0, readBuffer, 0, aData.byteLength);
// Submit and await
device.queue.submit([commandEncoder.finish()]);
await readBuffer.mapAsync(GPUMapMode.READ);
const resultArray = new Float32Array(readBuffer.getMappedRange());
console.log('First 10 results:', resultArray.slice(0, 10));
}
vectorAddDemo();
When you run this script, the console prints the first ten sums, confirming that the GPU performed the addition in parallel. Even though the dataset is tiny, the pattern scales effortlessly to millions of elements.
Real‑World Use Cases
Now that you’ve seen a basic compute pass, let’s explore two practical scenarios where WebGPU shines: image processing and particle‑based physics.
1. Real‑Time Image Filtering
WebGPU can replace traditional Canvas 2D filters with GPU‑accelerated kernels. Below is a Sobel edge detector that operates on an ImageBitmap. The shader reads pixel colors, computes gradient magnitudes, and writes the result back to a texture.
[[group(0), binding(0)]] var srcTex: texture_2d;
[[group(0), binding(1)]] var dstTex: texture_storage_2d<rgba8unorm, write>;
fn sobel(x: i32, y: i32) -> f32 {
let kx = array<array<i32, 3>, 3>(
array<i32, 3>( -1, 0, 1 ),
array<i32, 3>( -2, 0, 2 ),
array<i32, 3>( -1, 0, 1 )
);
let ky = array<array<i32, 3>, 3>(
array<i32, 3>( -1,-2,-1 ),
array<i32, 3>( 0, 0, 0 ),
array<i32, 3>( 1, 2, 1 )
);
var gx: f32 = 0.0;
var gy: f32 = 0.0;
for (var i: i32 = -1; i <= 1; i = i + 1) {
for (var j: i32 = -1; j <= 1; j = j + 1) {
let sample = textureLoad(srcTex, vec2(x + i, y + j), 0).r;
gx = gx + f32(kx[i+1][j+1]) * sample;
gy = gy + f32(ky[i+1][j+1]) * sample;
}
}
return sqrt(gx * gx + gy * gy);
}
[[stage(compute), workgroup_size(16,16)]]
fn main([[builtin(global_invocation_id)]] gid: vec3) {
let dims = textureDimensions(srcTex);
if (gid.x >= dims.x || gid.y >= dims.y) { return; }
let edge = sobel(i32(gid.x), i32(gid.y));
textureStore(dstTex, vec2(i32(gid.x), i32(gid.y)), vec4(edge, edge, edge, 1.0));
}
In JavaScript you would create a GPUTexture from the ImageBitmap, bind it to the compute pipeline, and finally copy the output texture to a CanvasRenderingContext2D for display. The result is a real‑time edge map that runs at 60 fps on most integrated GPUs.
2. Particle Physics Simulation
GPU compute excels at simulating thousands of independent particles. Each particle stores position, velocity, and mass. A simple Euler integrator updates positions based on forces such as gravity and drag.
[[block]] struct Particle {
pos: vec2<f32>;
vel: vec2<f32>;
mass: f32;
padding: f32; // Align to 16‑byte boundary
};
[[group(0), binding(0)]] var<storage, read_write> particles: array<Particle>;
[[stage(compute), workgroup_size(128)]]
fn main([[builtin(global_invocation_id)]] gid: vec3) {
let i = gid.x;
if (i >= arrayLength(&particles)) { return; }
var p = particles[i];
// Simple gravity toward (0,0)
let dir = -p.pos;
let dist = length(dir) + 0.001;
let force = (9.81 * p.mass) / (dist * dist);
let accel = normalize(dir) * force / p.mass;
// Apply drag
let drag = -0.1 * p.vel;
p.vel = p.vel + (accel + drag) * 0.016; // assume 60 Hz timestep
p.pos = p.pos + p.vel * 0.016;
particles[i] = p;
}
After each compute pass you render the particles with a vertex shader that reads the same buffer as a vertex attribute. This “compute‑then‑draw” loop produces fluid‑like motion without ever touching the CPU for per‑particle math.
Advanced Topics: Buffer Management & Pipeline Optimization
When you move beyond toy examples, efficient buffer usage becomes critical. WebGPU offers three primary buffer usages: COPY_SRC/DST, MAP_READ/WRITE, and STORAGE. Mixing them incorrectly can cause stalls.
- Staging Buffers: Use a small
MAP_READbuffer as a “readback” target, and copy results from a largerSTORAGEbuffer. This avoids mapping the GPU‑resident buffer directly. - Double Buffering: Alternate between two storage buffers each frame. While one is being read by the GPU, the other can be prepared on the CPU, eliminating synchronization bottlenecks.
- Bind Group Caching: Bind groups are immutable; create them once per resource layout and reuse them across frames to reduce CPU overhead.
Pipeline creation also benefits from caching. The GPUDevice can compile pipelines in the background, but you should request them early (e.g., during an asset‑loading phase) to avoid frame‑time hitches.
Memory Alignment Tips
Pro tip: WGSL structs must be aligned to 16‑byte boundaries. If you store avec3<f32>, add a dummyfloatas padding. Misaligned buffers compile but produce undefined results on many GPUs.
Debugging Shaders
WebGPU currently lacks a built‑in shader debugger, but you can instrument your WGSL code with debugPrint‑style functions that write intermediate values to a storage buffer. After the compute pass, map that buffer and inspect the data in the console.
Pro Tips for Production‑Ready WebGPU Apps
- Feature Detection: Always fall back to WebGL or CPU paths if
navigator.gpuis missing. Users on older hardware still expect a functional site. - Chunked Workloads: Break massive datasets into smaller dispatches to keep the GPU’s command buffer size reasonable and to allow interleaved rendering.
- Precision Choices: Use
float16(if supported) for large arrays where half‑precision is acceptable. This halves memory traffic and can double throughput. - Resource Lifetime: Destroy buffers you no longer need with
buffer.destroy()to free GPU memory promptly.
Pro tip: When targeting mobile browsers, keep workgroup sizes ≤ 64 and avoid texture formats that require hardware conversion (e.g., rgba16float) to stay within power budgets.
Conclusion
WebGPU turns the browser into a first‑class compute platform, unlocking performance that previously required native code or WebGL hacks. By mastering device initialization, WGSL shader authoring, and efficient buffer strategies, you can build everything from real‑time image filters to large‑scale particle simulations—all without leaving the web stack. Keep experimenting, profile regularly, and watch the ecosystem evolve—WebGPU is still young, but its potential is already reshaping what’s possible on the client side.