**Ethan Black** @golemwire@librem.one · Jan 05, 2025, 20:02

**Ethan Black** @golemwire@librem.one · Jan 05, 2025, 20:02

Ethan Black @golemwire@librem.one

Jan 05, 2025, 20:02

For #programmers:
You are familiar with 64-bit floating point and 32-bit floating point, and may have heard about 16-bit floating point (present in some GPUs), but there is actually work on 8-BIT floating-point!

https://arxiv.org/abs/2209.05433
https://developer.nvidia.com/blog/nvidia-arm-and-intel-publish-fp8-specification-for-standardization-as-an-interchange-format-for-ai/

There is the "E5M2" variant, a "truncated IEEE FP16 format" (nice if lacking FP8). Although, at the miniscule 8-bit level, you don't necessarily need multiple NaNs or need infinities, so there is the "E4M3" variant as well.

#IEEE754 #AI

**Ethan Black** @golemwire@librem.one · 2025-01-05T20:02:41Z

Ethan Black @golemwire@librem.one

Mastodon word count for this post: 499. (1 off, man 😁)

Jan 05, 2025, 20:02 · Web · · ·