@jzb the problem is not Wayland, just like it is not systemd. The problem is trying to force people to adopt it or die.
I have use cases that are inimical to the fundamental assumptions Wayland has, and pretty much no 3D-accelerated hardware across my machine park, so it’s never going to be an option for me. So don’t make me "use it or die". Just keep X11 working and we’re good.
@mirabilos @jzb You can run most if not all Wayland compositors without 3D acceleration provided your Linux kernel is new enough to provide SimpleDRM. But, given your profile picture maybe you don't use Linux?
You've entirely missed the point, or built a strawman here. Problem isn't (lack of) 3D acceleration, but that the whole conceptual design of Wayland has been broken from day 0. People were pointing out the shortcomings all the time and mostly ignored.
Wayland's design breaks things.
Certain applications, like KiCAD will not support Wayland anytime soon, because it breaks the way KiCAD cooperates with window managers.
In Wayland-land there's no such cooperation.
@datenwolf @newbyte @mirabilos @jzb I have used KiCad and created projects in it under Wayland with no troubles whatsoever.
You know that Xwayland exists and isn't going anywhere, right? It's not some temporary glue to ease the migration, it's the main still maintained X11 implementation these days and it's here to stay.
Also Xwayland… yay, let's just glue some broken thing (Xorg implementation of X11) to another broken thing (the Wayland protocol.
What the Phoenix guys are doing is much, much better. Also more resource efficient, since the Wayland design already is testing memory bottlenecks with regard to 4k or 8k display resolutions.
@datenwolf @newbyte @mirabilos @jzb Maybe in your world dma-buf passing weights more with higher resolutions, but in the real world when you want high performance you end up with things like Gamescope.
That's not what I was hinting at.
Socraic question: What's the memory requirements for a 2 image swap chain of a fullscreen window at an 8k display resolution in R10G10B10A2 pixel format?
How many clients running (and compositing) at that resolution do fit in your typical GPU's VRAM?
@datenwolf @newbyte @mirabilos @jzb Ah, so you're not anti-Wayland, but anti-composition?
Then you'll be relieved to hear that Wayland is designed in a way that makes composition unnecessary in the case you described :)
Make the windows slightly smaller than fullscreen and slightly offset to each other, so that of each window you can see some pixels.
Or more devious: Make it a pyramid stacking where each lower window on the Z-stack sticks out by some pixels to the side of the window on top of it.
Wayland doesn't have the concept of pixel ownership based clipping of window contents. So every window by necessity will get a complete surface.
@datenwolf @newbyte @mirabilos @jzb So first it was supposedly about bandwidth (which is not an issue in this case either thanks to damage tracking), but now it's suddenly about RAM usage which you'd need to pay up one way or another anyway if you wanted to implement features commonly expected from desktop environments these days? 🤔
Read my post again. I didn't mention bandwidth at all. I wrote bottleneck, which can also mean, running out of RAM.
And which features of desktop environments would that be, that are expected?
Effects? You mean distractions.
Window content previews? Sure, let's take a whole, huge window and scale it down to a thumbnail, things will remain readable. Of course.
Client side decoration? eff those. I want my windows to have titles telling me what they are.
@datenwolf @newbyte @mirabilos @jzb Yeah, let's instead have the apps OOM once you drag their windows or view them in an expose-like arrangement 😂
OOM situations due to overloading with windows is not a problem if you do graphics the old-school way:
Single display screen framebuffer, windows cut areas from that framebuffer and drawing operations undergo the pixel ownership test.
This isn't magic or rocket science; it was figured out how to do this in the 1980ies.
Every graphics system worth its salt did it that way. And it even plays nicely with 3D acceleration.
@datenwolf @newbyte @mirabilos @jzb Except it's not 1980 anymore. Buffers to be presented on screen come from all sorts of sources - CPUs, GPUs, VPUs, ISPs. Some can render directly to a shared framebuffer, some can't. Some could be cropped to save memory, some can't - and when they can, they'd need to reallocate.
Yes, you can design a system that will consume less resources if you make it special-purposed to the set of requirements you happen to care about. Go ahead. Wayland is not that thing.
@datenwolf @newbyte @mirabilos @jzb (that said, a non-composited shared-framebuffer system could be easily built on top of Wayland anyway)
@datenwolf @newbyte @mirabilos @jzb Buffer passing mechanics are extensible in Wayland. Even dma-bufs are in their own extension, the only one in the core is wl_shm (which, BTW, works by giving the client a buffer to mmap to...).
What's more - some toolkits, such as Qt, even have elaborate plugin support to handle custom buffer passing mechanisms that you could use.
I won't implement this for you as I'm not interested in it, but you can just sit down and do it!
@datenwolf @newbyte @mirabilos @jzb ...and outright don't work when presented with modern challenges, unless you're willing to put the extra effort or compromise on performance as well :)
Outright don't work?
Okay, what exactly doesn't "work" with X11. And please don't list up shortcomings of Xorg, that could have been addressed a long time ago and are perfectly possible with X11.
@datenwolf @newbyte @mirabilos @jzb So how are you going to offload surfaces to hardware planes while preserving the ability to save memory from unused portions of the window? How are you going to utilize display's engine framebuffer compression to hit bandwidth targets on high res screens?
X11 can do lots of things, but only once you make it move away from these "old-school" ways and explode the complexity.
display engine framebuffer compression: First I'd ask myself why I'd want to address only single windows with that and not go for the worst case scenario of whole screen content updates every frame. → going for that case have the whole screen shared framebuffer go into the compression. Also – as I was so rightfully scolded for some 16 years ago by DarkShikari – don't even bother with explicit damage regions. Doesn't matter if you compare pixels to pixels or clip.
@datenwolf @newbyte @mirabilos @jzb Well, maybe because the compressed buffer is produced by the GPU and the display engine knows how to decompress it on the fly while it's fully opaque to the CPU?
If the shared framebuffer is wholly contained on the GPU side (and lets face it, that's the only configuration in which you're hitting the display engine), then every access by the CPU will incur a DMA transfer anyway. Which means that you're going through the hardware responsible for layout transition (and oh, that's where de-/compression happens). You can make this transparent to the CPU.
Or you know, you could just have compressed format CPU side, too.
@datenwolf @newbyte @mirabilos @jzb > (and lets face it, that's the only configuration in which you're hitting the display engine)
Far from it! I have some hardware next to me that won't be able to smoothly output 4K video if it goes through the GPU (and not even speaking about CPU), and yet can do it. Display pipelines in modern SoCs are a bit more complex than in 1990s.
BTW: I want to see just one general purpose Wayland compositor that's aware of these things, also then also a general purpose GUI toolkit that makes use of these hardware features.
If your concern is with low-latency, high-framerate applications (such as games), then you're better off using one of the direct-to-display APIs anyway. For any graphical overlay inject those into the graphics stack. You know, how you can do with Vulkan and explicit layers.
@datenwolf @newbyte @mirabilos @jzb Well, these things Just Work™ in several commonly used Wayland compositors and toolkits out there today.
@newbyte @mirabilos @jzb @dos which ones? Link please.
@datenwolf @newbyte @mirabilos @jzb Pretty much all of them these days, as this is handled by dmabuf feedbacks and buffer modifiers, so it works even with split render/display pipelines or with multiple GPUs.
Offload surfaces to hardware planes with saving of memory regions beneath.
This is a little ambiguous to me, and there are at least 3 different ways how I can understand it. Please clarify.
@datenwolf @newbyte @mirabilos @jzb Modern display engines provide some in-hardware composition abilities. You're explicitly not interested in window composition, fine, but applications want to draw over buffers that are opaque to them, such as video content, which can then go straight from the VPU to the display engine without CPU or GPU involvement.
That's well-supported today with Wayland and necessary for reasonable performance on some hardware out there.
@newbyte @mirabilos @jzb @dos can you link me some code? Genuinely interested.
@datenwolf @newbyte @mirabilos @jzb That's just one implementation, at least KWin and Mutter have their own ones: https://gitlab.freedesktop.org/emersion/libliftoff
GTK has GtkGraphicsOffload which is used by several apps out there - and that's just from the top of my head.
I was hoping for something I wasn't familiar with already. But alright. Anyway, want to know how you could implement support for that in an X11 environment?
Put everything on a shrared screen framebuffer GPU side, and for every surface that doesn't fit the pixelformat of the screen framebuffer do:
1/
- if the device can support the offloaded surface format via strided pointers and it fits within the target rectangle: Alias the screen framebuffer
- else: Mask the unused region of the shared frame buffer using the sparse image capabilities of the device (most everything built in the past 20 years supports that) and do your separate surface.
2/
Of course, every device behaves a little bit different, so you might want to have device and driver dependent code paths to optimally use this stuff.
You know, something like a – uh –
device dependent … x?
DDX
Yeah, that's a good acronym.
3/3
@datenwolf @newbyte @mirabilos @jzb Yes, we even had Xvideo. But you know that offloaded surfaces can be efficiently overlaid, underlaid, alpha-composited, transformed? And it works today with device-independent APIs :)
I'm not even thinking about Xvideo here. Or X11 for that matter.
What I'm trying to point out here is, that giving each and every top-level window on screen its very own, fully fledged surface is wasteful.
For the most part that's simply not necessary. There's only a very tight set of applications that do benefit, or outright require wholly owned surfaces. Video players are among them. But you normally don't have a lot of them open at the same time.
@datenwolf @newbyte @mirabilos @jzb Meanwhile in the real world even X11 clients allocate their wholly owned surfaces to be presented with DRI.
Only X11 clients written by people who:
- didn't knew what they were doing
- use libraries where the relevant codepaths were written by people who didnt care (the X11 backends of Qt-4 and later, and GTK+-2 and later fall into that category)
or
- use libraries where the relevant codepaths were written by people who didn't know what they were doing
1/
The circumstance that some toolkits decided that properly using X11 was too pedestrian for them does not make for a good excuse to make – in the words of Joel Spolsky – "the single worst strategic mistake" you can do: "They decided to rewrite the code from scratch." [1]
Properly written software that skillfully uses X11 performs better and looks just as good.
[1]: https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/
1/
As far as practicality goes: The most important metric in software to me is, how little it gets in my way. Either by being cumbersome to use, having poor UX, or being so low performant, that it causes serious mental context switching and preservation issues.
For legacy hardware support reasons I'm keeping around a copy of EAGLE-4.16, which AFAIK uses Qt-3. Even on my dual 4k monitor setup this thing flies. When EAGLE switched to Qt-5 it turned into molasses.
3/
Oh, and as further fun fact: This EAGLE-4 theme switches in response to setting a new XResources instantly.
Qt-6 and GTK-4 have become more responsive, but still: These – ah – so modern toolkits that make use of all the fancy acceleration features of modern hardware still fail to do something, that software from 25 years ago has zero problems with, even if confronted with 16× the amount of pixels.
This is the cost/benefit calculation I do with regard to Wayland/X11
Wayland gives me very little, but piles a huge combinatorial explosion of extensions on clients and compositors, vastly inflating the complexity of the whole system, if it wants to be resource efficient in general purpose use.
X11 (Win32 GDI for that matter) give me a reasonably well designed resource allocation, that everything just uses.
@datenwolf @newbyte @mirabilos @jzb I have debugged plenty of X11 and Wayland client code in my life and having to deal with "combinatorial explosion" of stuff definitely describes the experience of working with the former more than the latter.
@dos @newbyte @mirabilos @jzb
So I do it… and then? Then only programs that actually know about these extensions and actually use them will give me their benefits.
Every program which developers' didn't care about going the extra mile will waste memory, simply by creating toplevel windows on screen.
Memory that's not available for doing actual work (like visualizing data).
Old-school graphics systems give the benefits to all applications, without extra effort.