The workflow for the Kitsune Tails script was that I would download it from Google Docs as text files, and then run it through our script tool to generate code from that
But the script quickly became large enough it got split over multiple docs. So I just used "type" (that's like "cat" for you linux folks) in the batch file to concatenate the text files first
Guess who then found out that Google Docs inserts a utf8 bom when downloading as plaintext? So then I had to add additional code to look for the byte order mark at the start of lines to explicitly remove it or it would mess things up. Very cool and fun and a good use of my limited time
tbh given the unicode standard says that while UTF8 BOMs are technically allowed their use is discouraged (see https://hachyderm.io/@danderson/113290842836069180 for why it's allowed) we should really have some kind of wall of shame for applications that add a UTF8 BOM *on purpose*
@eniko but this really hampers my esoteric programming language composed entirely of byte order marks 😢
@typeswitch @eniko This could work: require UTF-16 encoding, encode a BOM using little-endian UTF-16 to represent a 0, a BOM using big-endian UTF-16 to represent a 1. The fundamental building block of the language is a sequence of 3 little- or big-endian BOMs. This gives you 3 bits or 8 symbols, enough to encode the symbols of brainfuck. Adopt brainfuck semantics.
@mort @typeswitch @eniko You only need three symbols to encode Whitespace semantics. Use native BOM for one symbol, and non-native BOM as a prefix to either of them for two more symbols.