**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 20:26

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 20:26

Sebastian Krzyszkowiak @dos@librem.one

Sebastian Krzyszkowiak @dos@librem.one

2.37K Posts

613 Following

1.04K Followers

Homepage: https://dosowisko.net

Games: https://dos.itch.io

Holy Pangolin: https://holypangolin.com

Liberapay: https://liberapay.com/dos

Hi, I'm dos. Silly FLOSS games, open smartphones, terrible music and more. 50% of @holypangolin; 100% of dosowisko.net. he/him/any. I don't receive DMs.

Joined Apr 2019

613 Following 1.04K Followers

Posts Posts and replies Media

Jul 03, 2025, 20:26

Sebastian Krzyszkowiak @dos@librem.one

@pavel Seems it's the latter, as the result's exactly the same with 1052x780 camera frames and 263x195 video 😁

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 20:15

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 20:15

Jul 03, 2025, 20:15

Sebastian Krzyszkowiak @dos@librem.one

@pavel Plugged it into V4L2 - with a caveat that for now I fed the GPU full-res 13MP frames to meet stride alignment requirement (the shader output is still 526x390). It says it does 240 frames in 10.55s. I wonder if it's really slightly too slow, or just bad timing from our camera stack :)

d74ce9e196831394.mp4?1751573515

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 19:06

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 19:06

Jul 03, 2025, 19:06

Sebastian Krzyszkowiak @dos@librem.one

@pavel > I can't easily connect gstreamer to that

Why not? I quickly hacked up passing dma-bufs to GStreamer and even though I'm glFinishing and busy-waiting on a frame to get encoded sequentially it still manages to encode a 526x390 h264 stream in real time on L5.

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 15:07

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 15:07

Jul 03, 2025, 15:07

Sebastian Krzyszkowiak @dos@librem.one

@pavel That said, rendering to a linear buffer can be slower, that's expected. The question is whether gains from passing buffers around for free are higher, which for an actual "record video from a camera" use case will almost certainly be true (and which has very different performance characteristics from reading images from files - you can't directly attach a file as a texture).

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 15:00

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 15:00

Jul 03, 2025, 15:00

Sebastian Krzyszkowiak @dos@librem.one

@pavel I left the memcpy line commented out for a reason - with it uncommented, the result is exactly the same as with glReadPixels (which is effectively a memcpy on steroids). The point is to pass that buffer to the encoder directly, so it can read the data straight from the output buffer without waiting for memcpy to conclude.

I've also verified that the approach is sound by having the shader output different values each frame and accessing it via hexdump_pixels inside the loop. Still fast ;)

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 13:48

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 13:48

Jul 03, 2025, 13:48

Sebastian Krzyszkowiak @dos@librem.one

#StopKillingGames is about to reach its millionth signature as we speak, a month before the deadline. A tiny, but very necessary step in the right direction.

If you hurry it may be yours 😁 https://eci.ec.europa.eu/045/public/#/screen/home

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 13:03

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 03, 2025, 13:03

Jul 03, 2025, 13:03

Sebastian Krzyszkowiak @dos@librem.one

@janvlug @ronnylam About 4h of active use, 12h idling without suspend, 24h suspended - assuming that modem is always on with reasonable signal strength.

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 02, 2025, 20:44

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 02, 2025, 20:44

Jul 02, 2025, 20:44

Sebastian Krzyszkowiak @dos@librem.one

@pavel @datenwolf Current Mesa can do bunch of GLES3 stuff already, including texelFetch, once you force it with MESA_GLES_VERSION_OVERRIDE.

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 02, 2025, 18:49

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 02, 2025, 18:49

Jul 02, 2025, 18:49

Sebastian Krzyszkowiak @dos@librem.one

@pavel Also, it gets faster when you increase the buffer size, because rendering is so fast you're mostly measuring API overhead 😁

With full 13MP frames: 315.1 MB in 0.62s = 511.3 MB/s

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 02, 2025, 18:43

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 02, 2025, 18:43

Jul 02, 2025, 18:43

Sebastian Krzyszkowiak @dos@librem.one

@pavel Not only you had copies in- and out- of GLES context there, but these copies were sequential - and your benchmark waited until things were copied before proceeding with the next frame, so it was pretty much useless in assessing GPU performance. In practice, GStreamer can happily encode the previous frame while the GPU is busy with the current one, all while CSI controller is already receiving the next one.

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 02, 2025, 18:16

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 02, 2025, 18:16

Jul 02, 2025, 18:16

Sebastian Krzyszkowiak @dos@librem.one

@pavel After eliminating glReadPixels and having the output buffer mmaped instead: "18.9 MB in 0.08s = 244.4 MB/s"

After putting glTexImage2D out of the loop to emulate zero-copy import from V4L as well:
"18.9 MB in 0.05s = 400.1 MB/s"

https://dosowisko.net/stuff/bwtest.patch

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 02, 2025, 15:24

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 02, 2025, 15:24

Jul 02, 2025, 15:24

Sebastian Krzyszkowiak @dos@librem.one

@brie Unless there's a ton of dependencies to take care of as well, in my experience simply building a fresher deb by myself is usually the best option.

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 02, 2025, 10:34

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 02, 2025, 10:34

Jul 02, 2025, 10:34

Sebastian Krzyszkowiak @dos@librem.one

@pavel Megapixels is not an example of how to do things in the most performant way :) OpenGL operates in a VRAM-centric model, it's very copy-heavy. We don't need to copy things around, as our GPUs operate on the exact same memory CPUs do.

See GL_OES_EGL_image_external and https://docs.kernel.org/userspace-api/media/v4l/dmabuf.html

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 02, 2025, 10:23

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 02, 2025, 10:23

Jul 02, 2025, 10:23

Sebastian Krzyszkowiak @dos@librem.one

@pavel On 9f076a5, I'm getting 88MB/s with one green channel, 82MB/s with two and 105MB/s with nothing but static gl_FragColor. The three copies it does could be eliminated and I believe texelFetch could make it slightly faster on the GPU side too.

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 02, 2025, 09:48

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 02, 2025, 09:48

Jul 02, 2025, 09:48

Sebastian Krzyszkowiak @dos@librem.one

@pavel I'm confused. V4L lets you stream to a CMA dmabuf which should be importable as GL_TEXTURE_EXTERNAL_OES, right? Or am I missing something?

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 01, 2025, 20:36

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 01, 2025, 20:36

Jul 01, 2025, 20:36

Sebastian Krzyszkowiak @dos@librem.one

@mgorny No właśnie czekam aż policzy.

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 01, 2025, 19:57

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 01, 2025, 19:57

Jul 01, 2025, 19:57

Sebastian Krzyszkowiak @dos@librem.one

@tizilogic @pavel It's either 8-bit int, or 10-bit int stored as 16-bit.

GC7000L supports compute shaders, but etnaviv isn't there yet.

Naive debayering is easy, but for good picture quality you need much more than that.

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 01, 2025, 19:31

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 01, 2025, 19:31

Jul 01, 2025, 19:31

Sebastian Krzyszkowiak @dos@librem.one

@pavel Since I assume you're going to want to pass the rendered image into some kind of video encoder, you may want to make sure that you match stride and alignment requirements with your target buffer so etnaviv will be able to perform linear rendering rather than de-tile it afterwards (though IIRC it's currently gated behind ETNA_MESA_DEBUG).

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 01, 2025, 19:20

**Sebastian Krzyszkowiak** @dos@librem.one · Jul 01, 2025, 19:20

Jul 01, 2025, 19:20

Sebastian Krzyszkowiak @dos@librem.one

@pavel No, it's

RGRGRG
GBGBGB

You lose meaningful data if you ignore half of green pixels.

I see no reason why it couldn't be done. Just take care not to introduce needless copies in your processing path. dmabufs are your friends.

**Sebastian Krzyszkowiak** @dos@librem.one · Jun 30, 2025, 16:07

**Sebastian Krzyszkowiak** @dos@librem.one · Jun 30, 2025, 16:07

Jun 30, 2025, 16:07

Sebastian Krzyszkowiak @dos@librem.one

@edavies These were Polish eliminations. This year it goes worldwide btw 😁

Homepage: https://dosowisko.net

Games: https://dos.itch.io

Holy Pangolin: https://holypangolin.com

Liberapay: https://liberapay.com/dos

Hi, I'm dos. Silly FLOSS games, open smartphones, terrible music and more. 50% of @holypangolin; 100% of dosowisko.net. he/him/any. I don't receive DMs.

Joined Apr 2019

shotonlibrem5 Sep 18, 2025, 18:07

136

librem5 Aug 31, 2025, 13:29

Sebastian Krzyszkowiak @dos@librem.one

Sebastian Krzyszkowiak's choices:

shotonlibrem5 Sep 18, 2025, 18:07

librem5 Aug 31, 2025, 13:29