For :
You are familiar with 64-bit floating point and 32-bit floating point, and may have heard about 16-bit floating point (present in some GPUs), but there is actually work on 8-BIT floating-point!

arxiv.org/abs/2209.05433
developer.nvidia.com/blog/nvid

There is the "E5M2" variant, a "truncated IEEE FP16 format" (nice if lacking FP8). Although, at the miniscule 8-bit level, you don't necessarily need multiple NaNs or need infinities, so there is the "E4M3" variant as well.

Follow

Screenshot from the PDF.

It appears that (the one without multiple NaNs etc.) has less range, and less precision near 0, but smaller steps between numbers on average.
E4M3 also has NaN, just one type of NaN (nice).
I'm not thinking about here in particular, but I think I prefer E4M3 personally (for what it's worth).

Interesting development. Never thought float8's could have any use, and here we are in 2025 with the potential use of them.

Note: the data I got was from 2022. I had said 2025. (Can't edit posts.)

Show thread
Sign in to participate in the conversation
Librem Social

Librem Social is an opt-in public network. Messages are shared under Creative Commons BY-SA 4.0 license terms. Policy.

Stay safe. Please abide by our code of conduct.

(Source code)

image/svg+xml Librem Chat image/svg+xml