Towards interplanetary QUIC traffic | Adolfo Ochagavía

Have you ever asked yourself what protocols are used when downloading photos from the Perseverance Mars rover to Earth? I didn’t think about it until I found an interesting message on the Internet in April 2024:

I’m looking for someone with QUIK/QUIN knowledge to help with our deep space IP project. There will be part-time counselling. If interested please DM me.

The message itself is quite short and somewhat jargon-laden, so it took me a while to fully understand what the project was about:

  • Working with QUIC: An Internet protocol for reliable communications (ie, what we typically use TCP for).
  • Working with QUIC: The most popular Rust implementation of the QUIC protocol.
  • Using QUIC to communicate between Earth and computers located very far away (for example, on other planets).

Business was going well on my end, and I didn’t have much time to devote to any other consulting work, but… how could I say no to an interplanetary Internet project? I contributed to Quinn in the pastSo I felt completely ready to help and I actually decided to do so. This article provides a record of the adventure so far.

What are we trying to solve?

Deep space is big and full of challenges. The technical achievement of running a network in such an environment is nothing less than a miracle. To some extent, the problem has been solved: we (humanity) regularly exchange messages with rovers on Mars, and are even communicating with spacecraft outside the solar system.However, as more and more players enter the space exploration landscape, limitations in the current architecture become apparent,

Efforts to enhance deep space networking are ongoing, and one promising option to get there involves the adoption of IP protocol suites. In that context, QUIC is to become the protocol of choice for reliable communications. That’s where this project comes in: our goal is to show that QUIC can work reliably in deep space, and to provide guidance to anyone interested in deploying it.

quick and deep spot

Why all the fuss about “demonstrating that QUIC can reliably work in deep space”? Can’t you use it right away?

It turns out that communication in deep space is… complicated. First, there is extreme latency, to the extent that for example a message from Earth takes 3 to 23 minutes to reach Mars.On top of that, connectivity is intermittent, For example, it is often not possible to exchange radio signals between Earth and a Mars rover, with connectivity only taking some time to be restored,,

These conditions prevent QUIC from operating under its default configuration. For starters, any attempts to establish a relationship will end before they even have a chance to succeed. But the matter is deeper. Even if you can magically establish a relationship, other problems will arise and put an end to it in no time.,

Then how can QUIC be viable? The attentive reader may have already guessed the answer: the problem is not QUIC, but default configurationWhich was designed keeping the terrestrial Internet in mind. what we need is one custom configurationThis time targeting deep space, with guidelines to modify things further if a space mission deems it necessary,

Yes, QUIC is highly configurable. This is an incredibly powerful feature: it allows a standards-compliant implementation to run unchanged in a deep-space setting, as long as it exposes the necessary QUIC configuration knobs. clean!

What about venerable old TCP? People actually evaluated it in the early 2000s, but concluded that the protocol was unsuitable for deep space.,

Conducting QUIC experiments

Well, we want to find configurations that will let QUIC run efficiently in deep space. How do we actually do this?

First, I want to share some essential context. By “quick configuration” I mean a specific set of parameters that control the inner workings of the protocol: What is the estimated round-trip time before any packets are exchanged? How long does a peer have to wait before concluding that the connection has been lost due to inactivity? What congestion control mechanism will be used? you get the idea.

You can pick up a pen, paper and a calculator to calculate a set of values Perhaps Work. However, we all know that No plan survives contact with the enemyWe need to look at the parameters in action and determine empirically that they actually work, Hence the idea of ​​running an experiment came,

Running an experiment means configuring QUIC to use the desired parameters, then exchanging data over a network that simulates deep space conditions. With this setup you can gather relevant metrics, evaluate the choice of parameters, try others as you see fit, and slowly develop a solid understanding of what works and what doesn’t.

setup use take one

Our experiment setup involves a program with two components: a server application that exposes files over the QUIC connection, and a client application that downloads those files. They are connected to each other through a test network.

When I joined this project, the test network consisted of a set of virtual machines, carefully wired to replicate a relevant subset of the deep-space network (for example, the nodes involved when communicating between a NASA researcher’s laptop and the Mars rover). The network not only mirrored real nodes, but also had artificial delays and intermissions built into it to match conditions in deep space! It’s a clever setup and is still in use today.

However, there is one small problem that you may already be wondering about. Once you introduce real deep-space latency into your network, it can take a very long time to run an experiment. Do you want to test downloading a file from the Mars rover? You’d better make yourself a coffee in the meantime, because the travel time to Mars could reach up to 46 minutes. By the way, did I already mention that intermittent things can take even longer? Yes, recoil speed is a nightmare.

Unlocking instant experiments

When I saw our limited iteration speed, I took it as a personal challenge. “Not on my watch!”, was my internal battle cry. After all, I believe that immediate feedback is a prerequisite for productive research, not just a nice-to-have.

My hypothesis was that we could score runs quickly by controlling two things:

  1. WatchOur application’s clock should advance faster than usual, Ideally, the clock will smoothly reach time whenever the process is blocked due to waiting for the timer to expire, If done correctly, the time from start to finish will depend only on the speed of your computer,
  2. packet ioEven with a time-jumping clock, the application will still have to wait when reading packets from the network, Then progress will not be blocked by timers (which causes time jumps), but by IO (which requires actual waiting), Solution? Get rid of packet IO! Instead, run the client and server sides in the same process, and have them communicate over a simulated network (also running in that process), Such an in-process network, which is programmed and controlled by us, will have link delays based on the application’s clock, Therefore, they will be discarded like any other delay in the programme,

You may be wondering: is there a QUIC implementation that lets you control the clock and the underlying network? Well…Quinn does! The design of the library is incredibly modular and provides the necessary extension points.

For example, enabling clock time jumps was trivial. Quin delegates timekeeping to the async runtime, and the runtime we use (Tokio) comes with a feature to automatically advance the clock exactly the way we need it. we turned it on Builder::start_paused and it just worked,

Switching to simulated in-process networks was more involved, as it required programming the network simulation from scratch at first. I kept nibbling on the problem and finally solved it, then plugged the simulated network through Qwin AsyncUdpSocket And UdpPoller symptoms.

Did the effort succeed? Oh yes! We can now quickly run file downloads on QUIC… and besides being faster, we also get some additional features. By the way, we are retaining the old setup for additional validation of important test cases.

Bonus: Determinism and Debuggability

With complete control over the network, it became possible to make the workspace completely deterministic. Unlike runs in the old setup, now two runs with the same parameters always give the same output. This is important for reproducible experiments and has been incredibly useful so far (ie, there is no possibility of “it works on my machine”).

Debuggability also got some love. As packets travel through the in-process network, each peer records them in a synthetic .pcap File for later inspection. This way, you can use external tools like Wireshark to troubleshoot any problems or just see what is being transmitted over the simulated wire. This small investment has paid off handsomely. It gives you X-ray vision in what would otherwise be a black box. Debuggable systems rock!

wrapping up

So… what protocol is used when downloading photos from the Perseverance Mars rover to Earth? I was told it’s a low-level protocol called CFDP for now. Maybe in a few years the answer will be quick!

Gratitude

My work would not have been possible without Marc Blanchett, who is a passionate supporter of IP in deep space. He generously funded the project, answered my questions with infinite patience, and even reviewed early drafts of this blog post. He also wanted to open the experimental setup I had developed, so that anyone else could run the experiments. You can find the repository here.

Another honorable mention goes to the Quine community, especially the library’s creators Benjamin and Dirkjan. He designed a great API and, together with other members of the community, helped us by providing useful advice whenever we encountered problems along the way. If you are looking for a QUIC library in the Rust ecosystem, I would say QUIC is the best choice for you.



Leave a Comment