Can you program GPUs and do you want to become a HERO? #linuxphone
community needs your help.

We are trying record video, and have most pieces working, but one is
missing: fast enough debayering. That means about 23MB/sec on #librem5.

Debayering is not hard; camera images have subpixels split on two
lines, which need to be corrected. They also use different color
representation, but that's fixable by some table lookup and two matrix
multiplies.

Librem 5 has Vivante GPU, 4 in-order CPU cores and 3GB RAM. My feeling
is that it should be fast enough for that. If task is for some reason
impossible, that would be good to know, too.

Image data looks like this

RGRGRG...
xBxBxB...
.........
.........

Task is to turn that into usual rgbrgb.... format. rgb = RGB * color
matrix, with table lookups for better quality. I can fix that once I
get an example.

I'm looking for example code (#pinephone would work, too), reasons it
can not be done... and boosts if you have friends that can program
GPUs. #gpu #opensource

@pavel No, it's

RGRGRG
GBGBGB

You lose meaningful data if you ignore half of green pixels.

I see no reason why it couldn't be done. Just take care not to introduce needless copies in your processing path. dmabufs are your friends.

@dos As for copies... Yes, I'm currently doing more copies than needed. I measured Librem 5 at about 2GB/sec memory bandwidth, and stream is about 30MB/sec. At 1Mpix/24fps resolution, gstreamer should be able to encode it in real time.

Here's huge problem with v4l, which gives uncached memory buffers to userspace. That means one whole CPU core is dedicated to copying that to "normal" memory. If that is ever solved, yes, other optimalizations are possible. Currently, this means it is not even possible to copy anything bigger than 1Mpix out of the v4l.

@pavel I'm confused. V4L lets you stream to a CMA dmabuf which should be importable as GL_TEXTURE_EXTERNAL_OES, right? Or am I missing something?

@dos If you have example of that, that would be welcome :-). That's not how megapixels work, at least.

@pavel Megapixels is not an example of how to do things in the most performant way :) OpenGL operates in a VRAM-centric model, it's very copy-heavy. We don't need to copy things around, as our GPUs operate on the exact same memory CPUs do.

See GL_OES_EGL_image_external and docs.kernel.org/userspace-api/

@dos Sorry, hero, that's dark magic behind my understanding. I see the words but don't understand the sentences. :-(

I'd need working example here. I got surprisingly far vibecoding this, but even robots have their limits.

@pavel After eliminating glReadPixels and having the output buffer mmaped instead: "18.9 MB in 0.08s = 244.4 MB/s"

After putting glTexImage2D out of the loop to emulate zero-copy import from V4L as well:
"18.9 MB in 0.05s = 400.1 MB/s"

dosowisko.net/stuff/bwtest.pat

@dos Thanks for a patch. And yes, it makes the loop faster.. if you don't actually use the data. When used for loading/saving 720 images from the ramdisk, speed went from ~16 sec to ~21 sec.

@pavel I left the memcpy line commented out for a reason - with it uncommented, the result is exactly the same as with glReadPixels (which is effectively a memcpy on steroids). The point is to pass that buffer to the encoder directly, so it can read the data straight from the output buffer without waiting for memcpy to conclude.

I've also verified that the approach is sound by having the shader output different values each frame and accessing it via hexdump_pixels inside the loop. Still fast ;)

@dos But you only hexdumped first few pixels, right?

Is that buffer uncached or something?

I pushed current code to https://gitlab.com/tui/debayer-gpu .

Yes, with memcpy(), I'm getting same results as before. If I get rid of the memcpy(), and attempt to fwrite() the buffer directly, things actually slow down.

I can't easily connect gstreamer to that, I'm going through ramdisk for now. I'm using time ./ocam.py debayer for testing -- https://gitlab.com/tui/tui/-/blob/master/ucam/ocam.py?ref_type=heads

@pavel > I can't easily connect gstreamer to that

Why not? I quickly hacked up passing dma-bufs to GStreamer and even though I'm glFinishing and busy-waiting on a frame to get encoded sequentially it still manages to encode a 526x390 h264 stream in real time on L5.

@dos Yeah, I played a bit. Nice. But segfaults, occasionaly, and may segfault more when I switch to matroskamux. So I guess crash may be gstreamer-related? :-). There's also some kind of noise in bottom right corner, maybe that's related, too.

@pavel Pretty sure it will just work fine once it's rewritten cleanly and does such arcane magic as releasing the buffers at the right time etc. :)

@dos Okay, I pushed code to https://gitlab.com/tui/tui/-/tree/master/icam?ref_type=heads . Debugging this may be a bit "fun".

Do I guess correctly that shaders can do arbitrary resolutions, such as 800x600?

I like the v4l+shaders integration. I'm not sure if I like the v4l+shaders+gstreamer integration.

@pavel Yes, of course.

BTW. Turns out that streaming to YouTube instead of a local file is just a matter of using rtmpsink instead of filesink 😁

@pavel I'm playing with GStreamer now (which is new for me) and it seems like most of this code could be replaced with GStreamer elements, and the rest should neatly plug in as custom elements 😂

@dos Do you have some ideas how to do viewfinder easily?

@pavel You've got a dma-buf handle, already mapped buffer and even GStreamer with all its sinks available, so... however you want? Pretty much anything will be able to consume it easily.

@dos I don't have much experiences with GUI programming, so I was looking for suggestions. I've got dma-buf handle but would not know what to do with it in gtk, and maybe SDL is better match. Or perhaps stick to original plan and do user interface ("take picture" button etc) in another process.
Follow

@pavel For GTK: either docs.gtk.org/gdk4/class.Dmabuf or gstreamer.freedesktop.org/docu

For SDL with GL: just import it the same way V4L buffers are imported.

Frankly, it's flexible enough that your choice of toolkit should only depend on other factors.

Sign in to participate in the conversation
Librem Social

Librem Social is an opt-in public network. Messages are shared under Creative Commons BY-SA 4.0 license terms. Policy.

Stay safe. Please abide by our code of conduct.

(Source code)

image/svg+xml Librem Chat image/svg+xml