**Pavel Machek** @pavel@social.kernel.org · Jul 01, 2025, 19:04

**Pavel Machek** @pavel@social.kernel.org · Jul 01, 2025, 19:04

Pavel Machek @pavel@social.kernel.org

Jul 01, 2025, 19:04

Pavel Machek @pavel@social.kernel.org

Can you program GPUs and do you want to become a HERO? #linuxphone
community needs your help.

We are trying record video, and have most pieces working, but one is
missing: fast enough debayering. That means about 23MB/sec on #librem5.

Debayering is not hard; camera images have subpixels split on two
lines, which need to be corrected. They also use different color
representation, but that's fixable by some table lookup and two matrix
multiplies.

Librem 5 has Vivante GPU, 4 in-order CPU cores and 3GB RAM. My feeling
is that it should be fast enough for that. If task is for some reason
impossible, that would be good to know, too.

Image data looks like this

RGRGRG...
xBxBxB...
.........
.........

Task is to turn that into usual rgbrgb.... format. rgb = RGB * color
matrix, with table lookups for better quality. I can fix that once I
get an example.

I'm looking for example code (#pinephone would work, too), reasons it
can not be done... and boosts if you have friends that can program
GPUs. #gpu #opensource

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 01, 2025, 19:20

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 01, 2025, 19:20

Jul 01, 2025, 19:20

Sebastian Krzyszkowiak @dos@librem.one

@pavel No, it's

RGRGRG
GBGBGB

You lose meaningful data if you ignore half of green pixels.

I see no reason why it couldn't be done. Just take care not to introduce needless copies in your processing path. dmabufs are your friends.

**Pavel Machek** @pavel@social.kernel.org · Jul 01, 2025, 21:37

**Pavel Machek** @pavel@social.kernel.org · Jul 01, 2025, 21:37

Jul 01, 2025, 21:37

Pavel Machek @pavel@social.kernel.org

@dos As for copies... Yes, I'm currently doing more copies than needed. I measured Librem 5 at about 2GB/sec memory bandwidth, and stream is about 30MB/sec. At 1Mpix/24fps resolution, gstreamer should be able to encode it in real time.

Here's huge problem with v4l, which gives uncached memory buffers to userspace. That means one whole CPU core is dedicated to copying that to "normal" memory. If that is ever solved, yes, other optimalizations are possible. Currently, this means it is not even possible to copy anything bigger than 1Mpix out of the v4l.

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 02, 2025, 09:48

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 02, 2025, 09:48

Jul 02, 2025, 09:48

Sebastian Krzyszkowiak @dos@librem.one

@pavel I'm confused. V4L lets you stream to a CMA dmabuf which should be importable as GL_TEXTURE_EXTERNAL_OES, right? Or am I missing something?

**Pavel Machek** @pavel@social.kernel.org · Jul 02, 2025, 09:55

**Pavel Machek** @pavel@social.kernel.org · Jul 02, 2025, 09:55

Jul 02, 2025, 09:55

Pavel Machek @pavel@social.kernel.org

@dos If you have example of that, that would be welcome :-). That's not how megapixels work, at least.

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 02, 2025, 10:34

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 02, 2025, 10:34

Jul 02, 2025, 10:34

Sebastian Krzyszkowiak @dos@librem.one

@pavel Megapixels is not an example of how to do things in the most performant way :) OpenGL operates in a VRAM-centric model, it's very copy-heavy. We don't need to copy things around, as our GPUs operate on the exact same memory CPUs do.

See GL_OES_EGL_image_external and https://docs.kernel.org/userspace-api/media/v4l/dmabuf.html

**Pavel Machek** @pavel@social.kernel.org · Jul 02, 2025, 11:43

**Pavel Machek** @pavel@social.kernel.org · Jul 02, 2025, 11:43

Jul 02, 2025, 11:43

Pavel Machek @pavel@social.kernel.org

@dos Sorry, hero, that's dark magic behind my understanding. I see the words but don't understand the sentences. :-(

I'd need working example here. I got surprisingly far vibecoding this, but even robots have their limits.

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 02, 2025, 18:16

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 02, 2025, 18:16

Jul 02, 2025, 18:16

Sebastian Krzyszkowiak @dos@librem.one

@pavel After eliminating glReadPixels and having the output buffer mmaped instead: "18.9 MB in 0.08s = 244.4 MB/s"

After putting glTexImage2D out of the loop to emulate zero-copy import from V4L as well:
"18.9 MB in 0.05s = 400.1 MB/s"

https://dosowisko.net/stuff/bwtest.patch

**Pavel Machek** @pavel@social.kernel.org · Jul 03, 2025, 14:53

**Pavel Machek** @pavel@social.kernel.org · Jul 03, 2025, 14:53

Jul 03, 2025, 14:53

Pavel Machek @pavel@social.kernel.org

@dos Thanks for a patch. And yes, it makes the loop faster.. if you don't actually use the data. When used for loading/saving 720 images from the ramdisk, speed went from ~16 sec to ~21 sec.

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 15:00

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 15:00

Jul 03, 2025, 15:00

Sebastian Krzyszkowiak @dos@librem.one

@pavel I left the memcpy line commented out for a reason - with it uncommented, the result is exactly the same as with glReadPixels (which is effectively a memcpy on steroids). The point is to pass that buffer to the encoder directly, so it can read the data straight from the output buffer without waiting for memcpy to conclude.

I've also verified that the approach is sound by having the shader output different values each frame and accessing it via hexdump_pixels inside the loop. Still fast ;)

**Pavel Machek** @pavel@social.kernel.org · Jul 03, 2025, 15:09

**Pavel Machek** @pavel@social.kernel.org · Jul 03, 2025, 15:09

Jul 03, 2025, 15:09

Pavel Machek @pavel@social.kernel.org

@dos But you only hexdumped first few pixels, right?

Is that buffer uncached or something?

I pushed current code to https://gitlab.com/tui/debayer-gpu .

Yes, with memcpy(), I'm getting same results as before. If I get rid of the memcpy(), and attempt to fwrite() the buffer directly, things actually slow down.

I can't easily connect gstreamer to that, I'm going through ramdisk for now. I'm using time ./ocam.py debayer for testing -- https://gitlab.com/tui/tui/-/blob/master/ucam/ocam.py?ref_type=heads

**Sebastian Krzyszkowiak** @dos@librem.one · 2025-07-03T19:06:36Z

Sebastian Krzyszkowiak @dos@librem.one

@pavel > I can't easily connect gstreamer to that

Why not? I quickly hacked up passing dma-bufs to GStreamer and even though I'm glFinishing and busy-waiting on a frame to get encoded sequentially it still manages to encode a 526x390 h264 stream in real time on L5.

Jul 03, 2025, 19:06 · Web · · ·

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 20:15

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 20:15

Jul 03, 2025, 20:15

Sebastian Krzyszkowiak @dos@librem.one

@pavel Plugged it into V4L2 - with a caveat that for now I fed the GPU full-res 13MP frames to meet stride alignment requirement (the shader output is still 526x390). It says it does 240 frames in 10.55s. I wonder if it's really slightly too slow, or just bad timing from our camera stack :)

d74ce9e196831394.mp4?1751573515

**Pavel Machek** @pavel@social.kernel.org · Jul 03, 2025, 20:26

**Pavel Machek** @pavel@social.kernel.org · Jul 03, 2025, 20:26

Jul 03, 2025, 20:26

Pavel Machek @pavel@social.kernel.org

@dos Camera is 23.5 FPS, IIRC. Do you have it under version control somewhere? This is a bit of achievement :-).

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 21:04

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 21:04

Jul 03, 2025, 21:04

Sebastian Krzyszkowiak @dos@librem.one

@pavel https://paste.debian.net/1384224/

It's ugly, hardcodes everything, lies on frame timing, occasionally segfaults. Most of it is copied straight from LLM, I just massaged the pieces to work together. Not the kind of code I'd like to sign off on :) But it's a working example, so have fun with it.

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 21:12

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 21:12

Jul 03, 2025, 21:12

Sebastian Krzyszkowiak @dos@librem.one

@pavel The first thing to do to improve it (after cleaning it up) would be to actually make use of the buffer pool. Dequeue the buffer, attach it as a texture, kick off rendering, get a fence and pass it with the output buffer to GStreamer without waiting on rendering to finish, then queue it back asynchronously once rendering is done. This should allow for much more complex shaders than this sequential code does.

**Pavel Machek** @pavel@social.kernel.org · Jul 03, 2025, 21:35

**Pavel Machek** @pavel@social.kernel.org · Jul 03, 2025, 21:35

Jul 03, 2025, 21:35

Pavel Machek @pavel@social.kernel.org

@dos Fences; that must be some kind of dark magic.

This code seems too good to be true. So, just to be sure, and in case you disappear tomorrow, can I add /* Copyright 2025 Sebastian Krzyszkowiak, GPLv2 */ and act according to that?

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 21:44

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 21:44

Jul 03, 2025, 21:44

Sebastian Krzyszkowiak @dos@librem.one

@pavel Good question. Not sure what license would be appropriate to put on something that's mostly an output of a model trained on code on all sorts of licenses anyway...

But given that it's just a bit of glue code between three APIs put together as an example, consider it to be under MIT-0 😜

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 21:45

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 21:45

Jul 03, 2025, 21:45

Sebastian Krzyszkowiak @dos@librem.one

@pavel (the parts that I added at least, there are parts of your code in there still)

**Pavel Machek** @pavel@social.kernel.org · Jul 03, 2025, 21:58

**Pavel Machek** @pavel@social.kernel.org · Jul 03, 2025, 21:58

Jul 03, 2025, 21:58

Pavel Machek @pavel@social.kernel.org

@dos Thank you! I'll take closer look tomorrow or over the weekend. In the meantime, would you have Makefile or build command that goes with it?

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 21:58

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 21:58

Jul 03, 2025, 21:58

Sebastian Krzyszkowiak @dos@librem.one

@pavel LDLIBS = -lEGL -lGLESv2 -lm -ldrm -I/usr/include/libdrm -lgbm -lgstvideo-1.0 -lgstapp-1.0 -lgstallocators-1.0 -lgstreamer-1.0 -lgobject-2.0 -lglib-2.0 -I/usr/include/gstreamer-1.0 -I/usr/include/glib-2.0 -I/usr/lib/aarch64-linux-gnu/glib-2.0/include

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 21:36

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 21:36

Jul 03, 2025, 21:36

Sebastian Krzyszkowiak @dos@librem.one

@pavel BTW. The fact that I could stream full-res frames and bin them down in the shader at real time is an interesting news, as this may open up possibility to use phase detection autofocus.

**Pavel Machek** @pavel@social.kernel.org · Jul 03, 2025, 21:40

**Pavel Machek** @pavel@social.kernel.org · Jul 03, 2025, 21:40

Jul 03, 2025, 21:40

Pavel Machek @pavel@social.kernel.org

@dos Exactly. That's a bit of big deal. That's why I'm trying to make sure this code does not go away. I had phase-detection auto-focus working at one point, but decided it is unusable as I did not see a way to scale down images quickly enough.

Plus it also adds possibility of zooming.

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 21:57

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 21:57

Jul 03, 2025, 21:57

Sebastian Krzyszkowiak @dos@librem.one

@pavel There's plenty of low-hanging fruits in there. Higher frame rates and 10-bit output are also likely some debugging session or two away 😜

**Pavel Machek** @pavel@social.kernel.org · Jul 03, 2025, 22:05

**Pavel Machek** @pavel@social.kernel.org · Jul 03, 2025, 22:05

Jul 03, 2025, 22:05

Pavel Machek @pavel@social.kernel.org

@dos :-) Hopefully. I'll believe things when I see them running locally.

BTW there's one more important thing this can probably do: take full-resolution photos while recording video.

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 22:19

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 22:19

Jul 03, 2025, 22:19

Sebastian Krzyszkowiak @dos@librem.one

@pavel There's a question whether it will be worth elevated power consumption though. I've also stumbled upon csi erroring out with "Rx fifo overflow" requiring a reboot to recover that I haven't seen at lower resolutions, but haven't looked closer.

**Pavel Machek** @pavel@social.kernel.org · Jul 03, 2025, 22:36

**Pavel Machek** @pavel@social.kernel.org · Jul 03, 2025, 22:36

Jul 03, 2025, 22:36

Pavel Machek @pavel@social.kernel.org

@dos Yes, there's more work to be done in the kernel; sometimes camera does not work after reboot, bayer-10 modes are not supported, ... :-(. And yes, it will take more power, but with phase-detection AF, it should be significantly better camera.

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 22:57

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 22:57

Jul 03, 2025, 22:57

Sebastian Krzyszkowiak @dos@librem.one

@pavel Toggling the killswitch makes it appear though.

IIRC PDAF was also usable at half-res.

RAW10 is just a matter of setting up clocks for higher bandwidth and more lanes. Switching data format is then just a single register away.

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 23:01

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 23:01

Jul 03, 2025, 23:01

Sebastian Krzyszkowiak @dos@librem.one

@pavel When I lie to GStreamer and tell it that its input is in YUY2, it gets faster - perhaps even fast enough to encode at 1052x780. That's another opportunity for improvement.

(and there's nothing magic about fences, it's just a simple synchronization primitive 😛)

**Pavel Machek** @pavel@social.kernel.org · Jul 04, 2025, 07:36

**Pavel Machek** @pavel@social.kernel.org · Jul 04, 2025, 07:36

Jul 04, 2025, 07:36

Pavel Machek @pavel@social.kernel.org

@dos Thanks, I got it to work. I'm putting it into tui repository... and will probably need to reindent it.

For me, there's about 50% CPU usage, so there's still some room.

Yes, YUY2 will be faster; it will also have lower color resolution.

And agreed, there's nothing magic about fences. There's nothing magic about riding horse w/o reins and nothing magic about flying 737, either :-).

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 20:26

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 20:26

Jul 03, 2025, 20:26

Sebastian Krzyszkowiak @dos@librem.one

@pavel Seems it's the latter, as the result's exactly the same with 1052x780 camera frames and 263x195 video 😁

**Pavel Machek** @pavel@social.kernel.org · Jul 03, 2025, 20:29

**Pavel Machek** @pavel@social.kernel.org · Jul 03, 2025, 20:29

Jul 03, 2025, 20:29

Pavel Machek @pavel@social.kernel.org

@dos If you want to make sure, just point camera at the clock :-). gstreamer should get timing information at the input, so I'd expect dropped frames (not wrong speed) if things go wrong.

**Pavel Machek** @pavel@social.kernel.org · Jul 03, 2025, 20:28

**Pavel Machek** @pavel@social.kernel.org · Jul 03, 2025, 20:28

Jul 03, 2025, 20:28

Pavel Machek @pavel@social.kernel.org

@dos I am not brave enough to debug gstreamer + openGL problems in the same process. You are either lucky or WIZARD :-).

**Pavel Machek** @pavel@social.kernel.org · Jul 04, 2025, 08:57

**Pavel Machek** @pavel@social.kernel.org · Jul 04, 2025, 08:57

Jul 04, 2025, 08:57

Pavel Machek @pavel@social.kernel.org

@dos Yeah, I played a bit. Nice. But segfaults, occasionaly, and may segfault more when I switch to matroskamux. So I guess crash may be gstreamer-related? :-). There's also some kind of noise in bottom right corner, maybe that's related, too.

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 04, 2025, 09:01

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 04, 2025, 09:01

Jul 04, 2025, 09:01

Sebastian Krzyszkowiak @dos@librem.one

@pavel Pretty sure it will just work fine once it's rewritten cleanly and does such arcane magic as releasing the buffers at the right time etc. :)

**Pavel Machek** @pavel@social.kernel.org · Jul 04, 2025, 09:22

**Pavel Machek** @pavel@social.kernel.org · Jul 04, 2025, 09:22

Jul 04, 2025, 09:22

Pavel Machek @pavel@social.kernel.org

@dos Okay, I pushed code to https://gitlab.com/tui/tui/-/tree/master/icam?ref_type=heads . Debugging this may be a bit "fun".

Do I guess correctly that shaders can do arbitrary resolutions, such as 800x600?

I like the v4l+shaders integration. I'm not sure if I like the v4l+shaders+gstreamer integration.

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 04, 2025, 13:45

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 04, 2025, 13:45

Jul 04, 2025, 13:45

Sebastian Krzyszkowiak @dos@librem.one

@pavel Yes, of course.

BTW. Turns out that streaming to YouTube instead of a local file is just a matter of using rtmpsink instead of filesink 😁

eff4dcd93e28ca96.png?1751636644

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 04, 2025, 16:04

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 04, 2025, 16:04

Jul 04, 2025, 16:04

Sebastian Krzyszkowiak @dos@librem.one

@pavel I'm playing with GStreamer now (which is new for me) and it seems like most of this code could be replaced with GStreamer elements, and the rest should neatly plug in as custom elements 😂

**Pavel Machek** @pavel@social.kernel.org · Jul 04, 2025, 16:24

**Pavel Machek** @pavel@social.kernel.org · Jul 04, 2025, 16:24

Jul 04, 2025, 16:24

Pavel Machek @pavel@social.kernel.org

@dos I don't believe gstreamer can handle complex cameras. But yes, eventually this code should disappear into libraries somewhere.

**Pavel Machek** @pavel@social.kernel.org · Jul 04, 2025, 19:28

**Pavel Machek** @pavel@social.kernel.org · Jul 04, 2025, 19:28

Jul 04, 2025, 19:28

Pavel Machek @pavel@social.kernel.org

@dos Do you have some ideas how to do viewfinder easily?

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 05, 2025, 12:22

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 05, 2025, 12:22

Jul 05, 2025, 12:22

Sebastian Krzyszkowiak @dos@librem.one

@pavel You've got a dma-buf handle, already mapped buffer and even GStreamer with all its sinks available, so... however you want? Pretty much anything will be able to consume it easily.

**Pavel Machek** @pavel@social.kernel.org · Jul 05, 2025, 13:30

**Pavel Machek** @pavel@social.kernel.org · Jul 05, 2025, 13:30

Jul 05, 2025, 13:30

Pavel Machek @pavel@social.kernel.org

@dos I don't have much experiences with GUI programming, so I was looking for suggestions. I've got dma-buf handle but would not know what to do with it in gtk, and maybe SDL is better match. Or perhaps stick to original plan and do user interface ("take picture" button etc) in another process.

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 05, 2025, 13:44

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 05, 2025, 13:44

Jul 05, 2025, 13:44

Sebastian Krzyszkowiak @dos@librem.one

@pavel For GTK: either https://docs.gtk.org/gdk4/class.DmabufTextureBuilder.html or https://gstreamer.freedesktop.org/documentation/gtk4/index.html

For SDL with GL: just import it the same way V4L buffers are imported.

Frankly, it's flexible enough that your choice of toolkit should only depend on other factors.

**Pavel Machek** @pavel@social.kernel.org · Jul 04, 2025, 20:11

**Pavel Machek** @pavel@social.kernel.org · Jul 04, 2025, 20:11

Jul 04, 2025, 20:11

Pavel Machek @pavel@social.kernel.org

@dos That gstreamer code in C is scary. Multiple threads, no locking, what could go wrong?

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 05, 2025, 12:28

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 05, 2025, 12:28

Jul 05, 2025, 12:28

Sebastian Krzyszkowiak @dos@librem.one

@pavel Not sure what you mean. GStreamer is internally multi-threaded, but its API is thread-safe and there's only one thread in this code. Of course any kind of production-quality code will use some mainloop and enqueue buffers based on callbacks rather than while(!processed){} loop, but it's not exactly rocket science.

**Pavel Machek** @pavel@social.kernel.org · Jul 05, 2025, 13:20

**Pavel Machek** @pavel@social.kernel.org · Jul 05, 2025, 13:20

Jul 05, 2025, 13:20

Pavel Machek @pavel@social.kernel.org

@dos There are at least two threads in this code: main one, and whatever runs "on_buffer_released". I don't yet know what causes the segfaults, but I suspect gstreamer.