@pavel No, it's
RGRGRG
GBGBGB
You lose meaningful data if you ignore half of green pixels.
I see no reason why it couldn't be done. Just take care not to introduce needless copies in your processing path. dmabufs are your friends.
@pavel I'm confused. V4L lets you stream to a CMA dmabuf which should be importable as GL_TEXTURE_EXTERNAL_OES, right? Or am I missing something?
@pavel Megapixels is not an example of how to do things in the most performant way :) OpenGL operates in a VRAM-centric model, it's very copy-heavy. We don't need to copy things around, as our GPUs operate on the exact same memory CPUs do.
See GL_OES_EGL_image_external and https://docs.kernel.org/userspace-api/media/v4l/dmabuf.html
@pavel After eliminating glReadPixels and having the output buffer mmaped instead: "18.9 MB in 0.08s = 244.4 MB/s"
After putting glTexImage2D out of the loop to emulate zero-copy import from V4L as well:
"18.9 MB in 0.05s = 400.1 MB/s"
@pavel I left the memcpy line commented out for a reason - with it uncommented, the result is exactly the same as with glReadPixels (which is effectively a memcpy on steroids). The point is to pass that buffer to the encoder directly, so it can read the data straight from the output buffer without waiting for memcpy to conclude.
I've also verified that the approach is sound by having the shader output different values each frame and accessing it via hexdump_pixels inside the loop. Still fast ;)
@pavel > I can't easily connect gstreamer to that
Why not? I quickly hacked up passing dma-bufs to GStreamer and even though I'm glFinishing and busy-waiting on a frame to get encoded sequentially it still manages to encode a 526x390 h264 stream in real time on L5.
@pavel Plugged it into V4L2 - with a caveat that for now I fed the GPU full-res 13MP frames to meet stride alignment requirement (the shader output is still 526x390). It says it does 240 frames in 10.55s. I wonder if it's really slightly too slow, or just bad timing from our camera stack :)
@pavel https://paste.debian.net/1384224/
It's ugly, hardcodes everything, lies on frame timing, occasionally segfaults. Most of it is copied straight from LLM, I just massaged the pieces to work together. Not the kind of code I'd like to sign off on :) But it's a working example, so have fun with it.
@pavel The first thing to do to improve it (after cleaning it up) would be to actually make use of the buffer pool. Dequeue the buffer, attach it as a texture, kick off rendering, get a fence and pass it with the output buffer to GStreamer without waiting on rendering to finish, then queue it back asynchronously once rendering is done. This should allow for much more complex shaders than this sequential code does.
@pavel BTW. The fact that I could stream full-res frames and bin them down in the shader at real time is an interesting news, as this may open up possibility to use phase detection autofocus.
@pavel There's plenty of low-hanging fruits in there. Higher frame rates and 10-bit output are also likely some debugging session or two away 😜
@pavel Toggling the killswitch makes it appear though.
IIRC PDAF was also usable at half-res.
RAW10 is just a matter of setting up clocks for higher bandwidth and more lanes. Switching data format is then just a single register away.