**Ethan Black** @golemwire@librem.one · Jan 05, 2025, 20:02

**Ethan Black** @golemwire@librem.one · Jan 05, 2025, 20:02

Ethan Black @golemwire@librem.one

Jan 05, 2025, 20:02

For #programmers:
You are familiar with 64-bit floating point and 32-bit floating point, and may have heard about 16-bit floating point (present in some GPUs), but there is actually work on 8-BIT floating-point!

https://arxiv.org/abs/2209.05433
https://developer.nvidia.com/blog/nvidia-arm-and-intel-publish-fp8-specification-for-standardization-as-an-interchange-format-for-ai/

There is the "E5M2" variant, a "truncated IEEE FP16 format" (nice if lacking FP8). Although, at the miniscule 8-bit level, you don't necessarily need multiple NaNs or need infinities, so there is the "E4M3" variant as well.

#IEEE754 #AI

**Ethan Black** @golemwire@librem.one · 2025-01-05T20:24:45Z

Ethan Black @golemwire@librem.one

Screenshot from the PDF.

It appears that #E4M3 (the one without multiple NaNs etc.) has less range, and less precision near 0, but smaller steps between numbers on average.
E4M3 also has NaN, just one type of NaN (nice).
I'm not thinking about #deepLearning here in particular, but I think I prefer E4M3 personally (for what it's worth).

Interesting development. Never thought float8's could have any use, and here we are in 2025 with the potential use of them.

#computing #programming