Racing Karts On A Rust GPU Kernel Driver

A few months ago, we introduced a Rust driver tier for Arm Mali GPUs that continues to see active development upstream and downstream. As the upstream code awaits readiness of the broader ecosystem, we focused on a downstream prototype that will serve as a baseline for community benchmarking and help guide our upstreaming efforts.

Today, we’re excited to share that the tier prototype has progressed from basic GPU performance to running full-screen 3D games like GNOME, Weston, and SuperTuxCart, demonstrating a functional, high-performance Rust driver that matches the performance of the C-driver and ultimately paves the way for upstream integration!

gnome on tires

set the stage

I previously discussed the relationship between user-mode drivers (UMDs) and kernel-mode drivers (KMDs) in one of my posts on how GPUs work. Here’s a quick recap to help get you up to speed:

One thing to understand from the previous section is that most of the complexity resides at the UMD level. This component is in charge of translating high-level API commands into lower-level commands that the GPU can understand. Yet the KMD is responsible for providing key operations such that its user-mode driver is actually implementable, and it must do so in a way that allows the underlying GPU hardware to be appropriately shared among the multiple tasks in the system.

While the UMD will take care of translating to GPU-specific commands from an API like Vulkan or OpenGL, the KMD must bring the GPU hardware to a state where it can accept requests before the device can be shared fairly among the UMDs in the system. This includes power management, parsing and loading firmware, as well as providing a way for the UMD to allocate GPU memory while ensuring isolation between different GPU contexts for security.

This was our initial focus for several months while working on the tire and testing was primarily done through the IGT framework. These tests will mainly involve performing simple ioctls() Against the driver and later checked to see if the results made sense.

By the way, those interested in further understanding the relationship between UMD and KMD on Linux should check out the talk given by my colleague Boris Brazilon at Kernel Recipes on this topic!

Submitting a Single Task

Once the GPU is ready to accept requests and user space can allocate GPU memory as needed, the UMD can place all the resources needed for a given workload into GPU buffers. These can be further referenced by command buffers containing instructions to be executed, as we explain in the excerpt below:

Along with the data describing the model and the machine code describing the shaders, the UMD must ask the KMD to place it in GPU memory before execution. It also needs to tell the GPU that it wants to make a draw call and set any conditions necessary to do so, which it does through the creation of VkCommandBuffers, which are structures containing instructions for the GPU to carry out to complete the workload. Also need to set up a way to be notified when the workload is complete and then allocate memory to hold the results.

In this sense, KMD is the last link between the UMD and the GPU hardware, providing the necessary APIs for job submission and synchronization. This ensures that all drawing operations created at the userspace level can actually reach the GPU for execution. It is the responsibility of the KMD to ensure that jobs are scheduled only when its dependencies are executed. It must notify UMD when the work is complete (in other words, signal), otherwise UMD will not know exactly when the results will be valid.

Additionally, before the tire can execute a complex workload that involves large amounts of simultaneous tasks, it must be able to execute a simple task correctly, otherwise debugging will be an ineffective nightmare. For this case, we created the simplest function we could think of: one that simply places an integer in a given memory location using the MOV instruction on the GPU. Our IGT test blocks until KMD signals that the job is complete.

Reading that memory location and making sure that its contents match the constants we were expecting indicates that the test was executed successfully. In other words, it shows that we were able to place the instructions in one of the GPU’s ring buffers and the hardware iterator picked it up and executed it correctly, paving the way for more complex tests that can actually try to pull something off.

The test source code for this dummy task is here.

making a rotating cube

With the rendering and signaling functions done, it was time to try rendering a scene. we choose kmscubeWhich draws a rotating cube on the screen, as the next milestone.

Due to its simple geometry and the fact that it is completely self-contained, it was a good candidate. In other words, no compositor is required and rendering occurs in a buffer that is handed directly to the display (KMS) driver.

procurement kmscube Running this would also prove that we were actually enforcing the job dependencies set by the UMD or we would get visual glitches. To do this, we relied on a slightly updated version of the Rust abstraction for the DRM scheduler posted by Asahi Lina a few years ago. The result was a rotating cube that was presented at the refresh rate of the display.

km.cube on tire

Using offscreen rendering we can go even faster, jumping from 30 or 60 fps to over 500 frames per second, matching the performance of the C driver. That’s a lot of frames being produced!

Can it render the entire UI?

natural progression must be launched Weston Or GNOMESince there is a lot going on when a DE like GNOME is running; We were almost expecting it not to work at first, so it was a big surprise when GNOME’s login page was presented,

In fact, you can log into GNOME, open Firefox, and…watch a YouTube video:

YouTube on Gnome on Tier

run vkcube under weston Also just works!

VK Cube on Weston on Tier

Can it render a game?

The final 3D milestone is running a game or any other 3D-intensive application. This will not only allow the GPU to handle tougher workloads, but it will also allow us to more accurately measure KMD’s performance. Again, the game rendered correctly and is perfectly playable, with no noticeable hiccups or other performance issues, as long as it runs full screen. Unfortunately, there are still some glitches in windowed mode: it is a prototype after all.

supertaxcart on tier

Why is this important?

It is important to clarify what this means and how it plays into the long-term vision of the project.

In fact, it’s easier to start with what we’re not claiming with this post: the tire isn’t ready for use as a daily-driver, and it will still take time to replicate it upstream, though it’s clear now that we’ll definitely get there. And as a mere prototype, it has a lot of shortcuts that we won’t have in the upstream version, even though it can run on top of an unmodified (ie, upstream) version of Mesa.

That said, this prototype can serve as an experimental driver and a testing ground for all the Rust abstraction work happening upstream. This will let us experiment with different design decisions and gather data on what really contributes to the purpose of the project. This is a testament that Rust GPUs can do KMD work, and not only that, but they can perform on par with their C counterparts.

Needless to say, we can’t make any assumptions about stability on the experimental driver, it may very well lock up and lose your work after some time, so be careful.

Finally, this was tested on a Rock 5B board, equipped with a Rockchip RK3588 system-on-chip, and it probably won’t work for any other devices at this time. Those who have this hardware should feel free to test our branch and provide feedback. The source code can be found here. make sure to enable CONFIG_TYR_DRM_DEPS And CONFIG_DRM_TYRFeel free to contribute to Tire by checking out our points board!

Below is a video showing the tire prototype in action. enjoy!

Racing karts on a Rust GPU kernel driver

set the stage

Submitting a Single Task

making a rotating cube

Can it render the entire UI?

Can it render a game?

Why is this important?

Like this:

Related

Leave a Comment Cancel reply

set the stage

Submitting a Single Task

making a rotating cube

Can it render the entire UI?

Can it render a game?

Why is this important?

Share this:

Like this:

Related

Leave a Comment Cancel reply