I recently did some work to enable client storage in Firefox’s OpenGL layers backend on Macs. That probably sounds like gobbledygook, so let me elaborate. When Firefox wants to turn the crisp, clean HTML and CSS of your web page into pixels on a screen, it has to do a lot of work. A good chunk of that work, it turns out, is simply writing a value into every pixel. You might think writing a value into a pixel should be fast, and it is, but when you realize that 1920 times 1080 (the width and height in pixels of many computer displays) is 2,073,600, then it becomes apparent that filling each and every pixel (60 times per second, in some cases) is a fairly daunting task.
Fortunately, we have something called a GPU. You see, while a CPU would have to write to each one of those pixels in sequence (this isn’t really true, for a number of reasons, but it’s close enough for our purposes), a GPU can compute and write to many values at once. Filling large numbers of pixels with values each computed in similar ways is what a GPU was made for, which is why we try very hard to let the GPU do as much of our pixel work as possible.
Enter “layers“. You see, to avoid having to write to so many pixels all the time, Firefox breaks your webpage into “layers” (think like layers in Photoshop, if you’ve ever used it or similar image editing software.) With layers, when one part of the web page moves or changes, ideally we can just move/change the layer that holds that part, then we can hand all of these layers off to the GPU, which will put them together to make the final picture (this is called “compositing“).
This has worked pretty well for quite a while. There is a project called WebRender (coming soon), which will radically re-envision how we update the screen in an attempt to get the GPU to do even more work for us — it’s kind of a big deal. But this post isn’t about WebRender. GPUs don’t always behave, and many different people have many different GPUs. We have to work out all the kinks on all different kinds of hardware, which will likely take a while, so in the mean time we want to make sure that the performance of the traditional layers system is not languishing.
Here’s where client storage comes in — when we first tell the GPU about a layer or when we’ve updated a layer, we call OpenGL’s
glTexSubImage2D), which is essentially our way of saying “here’s the chunk of memory that we want you to put on the screen (we’ll tell you where to put it later.)” If everything were simple, you’d think we’d truck that chunk of memory over to the GPU and be all set — unfortunately things are rarely simple. See, talking to the GPU can take a while (well, a while as far as a computer is concerned), so instead of making us wait inside
glTexImage2D until the GPU has our pixels, OpenGL puts our request into a queue and returns, happily pretending that everything is taken care of.
Here’s the rub: OpenGL’s hands are tied when it comes to that chunk of memory — after you’ve told it with
glTexImage2D that these are the pixels you want to upload, you can do whatever you want with the memory: free it, zero it, fill it with pictures of cats, etc., and the GPU will be required to display the pixels as they were when you called
glTexImage2D . But the GPU doesn’t even have those pixels yet! They’re only queued to be delivered. Which means in order to fulfill its promise, OpenGL has no other choice than to busily copy all of the pixels that you gave it into another chunk of memory, just so it can wait to be delivered to the GPU. This means that the CPU has to go through every pixel, read it, and write to another pixel somewhere else in memory. And all of this after we’ve gone through so much trouble to avoid touching pixels more than we need to on the CPU!
Client storage lets us avoid that CPU work by relaxing OpenGL’s restrictions — through a little dance and magic incantation you tell OpenGL, “don’t bother copying this memory just to queue it up. I won’t touch it until the GPU is done using it. Promise.” Of course, making good on that promise is the tricky part. The memory holding these pixels is often stored across processes, so we have to have a mechanism for the process running your web page to check with the process that talks to the GPU, then that process has to check in with the GPU to make sure it’s done with the pixels. There was a lot of debugging, and a lot of head scratching at little performance regressions, but in the end, Firefox on Mac is using less memory, less CPU, and runs smoother. 🎉