**Eniko Fox** @eniko@peoplemaking.games · Oct 12, 2024, 07:40

**Eniko Fox** @eniko@peoplemaking.games · Oct 12, 2024, 07:40

Eniko Fox @eniko@peoplemaking.games

Oct 12, 2024, 07:40

Utf8 byte order mark is great. We're just gonna put invisible binary data in this plain text file. Nothing could ever go wrong with that!

*concatenates two text files*

Oh........ oh no 😨

**Eniko Fox** @eniko@peoplemaking.games · Oct 12, 2024, 07:46

**Eniko Fox** @eniko@peoplemaking.games · Oct 12, 2024, 07:46

Oct 12, 2024, 07:46

Eniko Fox @eniko@peoplemaking.games

The workflow for the Kitsune Tails script was that I would download it from Google Docs as text files, and then run it through our script tool to generate code from that

But the script quickly became large enough it got split over multiple docs. So I just used "type" (that's like "cat" for you linux folks) in the batch file to concatenate the text files first

Guess who then found out that Google Docs inserts a utf8 bom when downloading as plaintext? So then I had to add additional code to look for the byte order mark at the start of lines to explicitly remove it or it would mess things up. Very cool and fun and a good use of my limited time

Show thread

**Eniko Fox** @eniko@peoplemaking.games · Oct 12, 2024, 08:31

**Eniko Fox** @eniko@peoplemaking.games · Oct 12, 2024, 08:31

Oct 12, 2024, 08:31

Eniko Fox @eniko@peoplemaking.games

tbh given the unicode standard says that while UTF8 BOMs are technically allowed their use is discouraged (see https://hachyderm.io/@danderson/113290842836069180 for why it's allowed) we should really have some kind of wall of shame for applications that add a UTF8 BOM *on purpose*

Show thread

**azul&** @typeswitch@gamedev.lgbt · Oct 12, 2024, 08:33

**azul&** @typeswitch@gamedev.lgbt · Oct 12, 2024, 08:33

Oct 12, 2024, 08:33

azul& @typeswitch@gamedev.lgbt

@eniko but this really hampers my esoteric programming language composed entirely of byte order marks 😢

**mort** @mort@fosstodon.org · Oct 12, 2024, 09:05

**mort** @mort@fosstodon.org · Oct 12, 2024, 09:05

Oct 12, 2024, 09:05

mort @mort@fosstodon.org

@typeswitch @eniko This could work: require UTF-16 encoding, encode a BOM using little-endian UTF-16 to represent a 0, a BOM using big-endian UTF-16 to represent a 1. The fundamental building block of the language is a sequence of 3 little- or big-endian BOMs. This gives you 3 bits or 8 symbols, enough to encode the symbols of brainfuck. Adopt brainfuck semantics.

**mort** @mort@fosstodon.org · Oct 12, 2024, 09:16

**mort** @mort@fosstodon.org · Oct 12, 2024, 09:16

Oct 12, 2024, 09:16

mort @mort@fosstodon.org

@typeswitch @eniko Tho this would not really be an appropriate use of the BOM I feel. The BOM represents having to deal wirh endianness differences, which my idea doesn't properly reflect. I think 0 should be encoded as the machine's native endianness UTF-16 BOM, while a 1 should be encoded as the non-native endianness UTF-16 BOM. A program may optionally start with a byte order mark to indicate native endianness.

**Sebastian Krzyszkowiak** @dos@librem.one · 2024-10-12T12:32:17Z

Sebastian Krzyszkowiak @dos@librem.one

@mort @typeswitch @eniko You only need three symbols to encode Whitespace semantics. Use native BOM for one symbol, and non-native BOM as a prefix to either of them for two more symbols.

Oct 12, 2024, 12:32 · Tuba · · ·