In 2007, I was experimenting with UI automation in the Northern Hemisphere summer. Born out of those efforts, xdotool came into existence when I spun it off from another project. The goal was modest – write some scripts that perform common keyboard, mouse, and window management tasks.
The first commit contained only a few basic commands – basic mouse and keyboard actions, as well as some window management actions such as movement, focus, and search. As time passed, Xdotool developed new features. Today, this project is 18 years old, and still going!
The further progress of time also brought external changes: Wayland came in hopes of replacing X11, and later Ubuntu tried to take a bite by launching Mir. The noise about Wayland, both good and bad, continued for years before major distros started shipping it. It wasn’t until 2016 when Fedora became the first distribution to ship it and even then, it was only for GNOME. It would take another five years before Fedora shipped KDE support for Wayland. Nearly a decade after Wayland was introduced, Ubuntu defaulted to Wayland in 2017, but reverted to X11 on the next release because screen sharing and remote desktop were not available.
Screen sharing and remote desktop. Features that have existed for decades on other systems that we all knew we’d need? They were not available and distros were shipping a default Wayland experience without them. It took a long time before you could join a Zoom call and share your screen. Strange.
All this to say, Wayland has had a long, bumpy road.
Back to xdotool: xdotool relies on some X11 features that existed before I started using Linux:
- Standard X11 operations – searching for windows by their title, moving windows, resizing them, closing, etc.
- XTest – A means to “test” things on X11 without requiring user action. It provides a way to perform mouse and keyboard actions from your code. You can type, you can move the mouse, you can click.
- EWMH – “Extended Window Manager Signals” – A specification for how to communicate with the window manager. This allows xdotool to switch virtual desktops, move windows to other desktops, find processes that own a window, etc.
All of the above are old, stable and well supported.
Wayland comes and ends Everything xdotool can do it. Some of that elimination is done with the excuse that it is “for security” and little is found to acknowledge what is being removed and why. That’s fine, I guess… but we ended up shipping Linux distros without important features, which has been addressed to some extent over time. For example, I can now share my screen on a video call.
So what happened to all the features provided in the name of security?
Fragmentation has only taken place. buckle up. Almost 10 years after Fedora’s first Wayland release and those extended features are still missing or have multiple implementation proposals and very few maintainers agreeing on what to support. I miss EWMH.
Do you want to send keystrokes and mouse actions?
- GNOME 48:
- Xwayland can send keystrokes to the compositor using XTEST. This is a bit nice, but every few minutes I get a popup with almost zero context saying “Allow remote interactions” with a toggle switch. This is confusing, because sending keystrokes from a local command doesn’t feel like “remote interaction”
- You can write code that uses the XDG Portal’s RemoteDesktop session to request access and then use libei to send keystrokes and mouse actions. Documentation is sparse as it is still quite new. However, it still gives you the above prompt and there doesn’t appear to be a permanent way to allow this, despite the portal API documenting such an option.
- KDE
- Xwayland performs similarly when XTEST is used. This time, it pops up “Remote control has been requested. Momentarily requested access to the remote control: input device” – this is confusingly written with no context, especially because these popups are new.
- Some other compositors support Wayland protocol extensions that allow things like virtual keyboard input. Fragmentation continues as there are many protocol extension proposals that add virtual text input, keyboard actions, and mouse/pointer actions. Which one will work or not depends entirely on which window manager/compositor you are using.
Outside of Wayland, Linux has Uinput which allows programs to create and use a virtual keyboard and mouse, but it requires root permissions. Furthermore, a keyboard device sends key codes, not symbols, which creates another layer of difficulty. To send the key symbol ‘A’ we need to know what key code (or key sequence) sends that symbol. To do this, you will need keyboard mapping. There are several ways to do this, and it’s not clear which one (Wayland’s wl_keyboard, X11’s XkbGetMap via
Window management is also quite strange. There appears to be no built-in protocol in Wayland for a program (like xdotool) to tell a window to do anything – be moved, resized, maximized, or closed.
- GNOME provides window management only through GNOME Shell extensions. JavaScript apps that you install into GNOME and have access to GNOME-specific JavaScript APIs. Invoking any of these from a shell command is not possible without doing some wild maneuvers: GNOME JavaScript allows you to access DBus, and you can write code that moves a window and exposes that method on DBus. I’m not the first one to consider this, as there are some published extensions that already do this, like Focused Window D-Bus. GNOME has a DBus method for executing JavaScript (org.gnome.Shell.Eval), but it is disabled by default.
- KDE has a similar concept to that offered by GNOME, but is completely incompatible. Fortunately, I think, KDE also has a DBus method for invoking JavaScript and, at the time of writing, it is Active As a default. A KDE+Wayland-specific derivative of xdotool, kdotool does exactly this by providing a command-line tool that allows you to manage your windows.
- Outside of KDE and GNOME, you may have luck with some third-party Wayland protocol extensions. If your compositor is based on wlroots, it will probably be usable with wlrctl, a command line tool similar to xdotool and wmctrl. Wlrctl only works if your compositor supports specific, non-default Wayland protocols, such as wlr-foreign-toplevel-management.
If we compare the above with xdotool, today, on X11, perhaps my confusion and surprise becomes more obvious – xdotool works with almost any window manager in X11 – typing, window movement, window search, etc. On Wayland, each compositor will need its own unique implementation as shown above with kdotool which only works on Wayland + KDE, not GNOME or anything else.
Fragmentation is probably a natural consequence of Wayland promising to focus on the smallest replacement for X11, and that smallness eliminates a lot of functionality. The missing features are still really essential, like screen sharing, and with no clear central leadership or community, the result feels predictable.
Of the third-party Wayland protocols, there are simply so many Input-related protocols: Input Method v1, Input Method v2, Text Input v3, KDE Mock Input, and Virtual Keyboard. And that’s just the Wayland protocol – the KDE and GNOME XDG remotedesktop thing is not Wayland related at all:
The weirdest thing I learned here is the new XTEST support in Axeland. The component chain is really wild:
- An X11 client sends a critical event using XTEST (normal)
- XWayland receives this and initiates a remote desktop XDG portal session to your own system (???)
- XDG Portal uses DBus in a strange way, many method calls receive responses via signals because DBus is not designed for long asynchronous methods.
- Once the Remote Desktop Portal session is setup, Xwayland asks file descriptor To talk to a libei server (emulated input server).
- After that, Libei is used to send events, query the keyboard map, etc.
- You can ask libbei for keyboard mapping (keycodes to keysims etc), you will get one more file descriptor and process it with another library, libxkbcommon.
If Conway’s law applies to this, then I In fact I would like to know more about the system (about the people) that creates this type of Rube-Goldberg device. Looking back, the people of Wayland said “Never, because safety!” Sent virtual input in. Dumpster bin, it’s like that it The path that revolves around those naysayers? Wild.
(With respect, the documentation for libei is very good, and I find the C code easy to read – I have no complaints there!)
I’m not alone in going very slowly on the way to Welland. Synergy only provided experimental support for Wayland a year ago, 8 years after Fedora first defaulted to Wayland, and that’s only when GNOME and friends implemented this weird
As I recently learned about Libei and XDG portal, I wrote some test code to send some keyboard events. Writing my own software, running on my own machine, GNOME still asked me “Allow remote interaction?” There doesn’t seem to be any way to permanently allow my own requests. I’m not the only one confused by GNOME and KDE giving such prompts.
Meanwhile, on want To make this work, but I can’t figure out how to proceed with all the fragmentation. I don’t mind what the protocol is, but I sure would love to have it Any Protocol that does what I need. Is it worth continuing?