**Pavel Machek** @pavel@social.kernel.org · Jul 01, 2025, 19:04

**Pavel Machek** @pavel@social.kernel.org · Jul 01, 2025, 19:04

Pavel Machek @pavel@social.kernel.org

Jul 01, 2025, 19:04

Pavel Machek @pavel@social.kernel.org

Can you program GPUs and do you want to become a HERO? #linuxphone
community needs your help.

We are trying record video, and have most pieces working, but one is
missing: fast enough debayering. That means about 23MB/sec on #librem5.

Debayering is not hard; camera images have subpixels split on two
lines, which need to be corrected. They also use different color
representation, but that's fixable by some table lookup and two matrix
multiplies.

Librem 5 has Vivante GPU, 4 in-order CPU cores and 3GB RAM. My feeling
is that it should be fast enough for that. If task is for some reason
impossible, that would be good to know, too.

Image data looks like this

RGRGRG...
xBxBxB...
.........
.........

Task is to turn that into usual rgbrgb.... format. rgb = RGB * color
matrix, with table lookups for better quality. I can fix that once I
get an example.

I'm looking for example code (#pinephone would work, too), reasons it
can not be done... and boosts if you have friends that can program
GPUs. #gpu #opensource

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 01, 2025, 19:20

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 01, 2025, 19:20

Jul 01, 2025, 19:20

Sebastian Krzyszkowiak @dos@librem.one

@pavel No, it's

RGRGRG
GBGBGB

You lose meaningful data if you ignore half of green pixels.

I see no reason why it couldn't be done. Just take care not to introduce needless copies in your processing path. dmabufs are your friends.

**Pavel Machek** @pavel@social.kernel.org · Jul 01, 2025, 21:37

**Pavel Machek** @pavel@social.kernel.org · Jul 01, 2025, 21:37

Jul 01, 2025, 21:37

Pavel Machek @pavel@social.kernel.org

@dos As for copies... Yes, I'm currently doing more copies than needed. I measured Librem 5 at about 2GB/sec memory bandwidth, and stream is about 30MB/sec. At 1Mpix/24fps resolution, gstreamer should be able to encode it in real time.

Here's huge problem with v4l, which gives uncached memory buffers to userspace. That means one whole CPU core is dedicated to copying that to "normal" memory. If that is ever solved, yes, other optimalizations are possible. Currently, this means it is not even possible to copy anything bigger than 1Mpix out of the v4l.

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 02, 2025, 09:48

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 02, 2025, 09:48

Jul 02, 2025, 09:48

Sebastian Krzyszkowiak @dos@librem.one

@pavel I'm confused. V4L lets you stream to a CMA dmabuf which should be importable as GL_TEXTURE_EXTERNAL_OES, right? Or am I missing something?

**Pavel Machek** @pavel@social.kernel.org · Jul 02, 2025, 09:55

**Pavel Machek** @pavel@social.kernel.org · Jul 02, 2025, 09:55

Jul 02, 2025, 09:55

Pavel Machek @pavel@social.kernel.org

@dos If you have example of that, that would be welcome :-). That's not how megapixels work, at least.

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 02, 2025, 10:34

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 02, 2025, 10:34

Jul 02, 2025, 10:34

Sebastian Krzyszkowiak @dos@librem.one

@pavel Megapixels is not an example of how to do things in the most performant way :) OpenGL operates in a VRAM-centric model, it's very copy-heavy. We don't need to copy things around, as our GPUs operate on the exact same memory CPUs do.

See GL_OES_EGL_image_external and https://docs.kernel.org/userspace-api/media/v4l/dmabuf.html

**Pavel Machek** @pavel@social.kernel.org · Jul 02, 2025, 11:43

**Pavel Machek** @pavel@social.kernel.org · Jul 02, 2025, 11:43

Jul 02, 2025, 11:43

Pavel Machek @pavel@social.kernel.org

@dos Sorry, hero, that's dark magic behind my understanding. I see the words but don't understand the sentences. :-(

I'd need working example here. I got surprisingly far vibecoding this, but even robots have their limits.

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 02, 2025, 18:16

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 02, 2025, 18:16

Jul 02, 2025, 18:16

Sebastian Krzyszkowiak @dos@librem.one

@pavel After eliminating glReadPixels and having the output buffer mmaped instead: "18.9 MB in 0.08s = 244.4 MB/s"

After putting glTexImage2D out of the loop to emulate zero-copy import from V4L as well:
"18.9 MB in 0.05s = 400.1 MB/s"

https://dosowisko.net/stuff/bwtest.patch

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 02, 2025, 18:43

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 02, 2025, 18:43

Jul 02, 2025, 18:43

Sebastian Krzyszkowiak @dos@librem.one

@pavel Not only you had copies in- and out- of GLES context there, but these copies were sequential - and your benchmark waited until things were copied before proceeding with the next frame, so it was pretty much useless in assessing GPU performance. In practice, GStreamer can happily encode the previous frame while the GPU is busy with the current one, all while CSI controller is already receiving the next one.

**Sebastian Krzyszkowiak** @dos@librem.one · 2025-07-02T18:49:39Z

Sebastian Krzyszkowiak @dos@librem.one

@pavel Also, it gets faster when you increase the buffer size, because rendering is so fast you're mostly measuring API overhead 😁

With full 13MP frames: 315.1 MB in 0.62s = 511.3 MB/s

Jul 02, 2025, 18:49 · Web · · ·