@gfxstrand What is split lock detection ?
(Where can I learn about it ?)
@Sobex @gfxstrand atomics that span a cacheline because they're badly aligned
x86 supports it, but it's super expensive because a bus lock instead of usual cacheline coherency protocols. most other architectures just give you an exception and you're dead
it's so expensive for x86 that there's plans to remove it, hence the split lock detection. and it's getting more expensive every year due to ever increasing cpu core counts
@sima Those plans are going to make hell for users with apps that have split lock bugs in them.
https://www.phoronix.com/news/Linux-Splitlock-Hurts-Gaming
Not that keeping them working forever is a good plan either. But it's gonna be hell on users if they just pull it out of the CPU and start throwing exceptions with no option for an emulation mode you can turn on.
Really, allowing unaligned uint64_t on x86 was a mistake.
@gfxstrand @Sobex it sucks
maybe there's going to be an opt-in for desktop backwards compat because games or something like that, where it's probably acceptable to take the system wide performance hit
@sima @gfxstrand @Sobex I'm sorry if that a dumb question because i know nothing of kernel development but is there no way to have a fake split lock for the app and then use a better locking system behind ?
@Nulhomme @gfxstrand @Sobex you could implement it in the kernel by zapping all pagetable entries userspace has pointing at that memory, doing the atomic while holding the folio lock (or two folio locks, if it also spawns a page boundary).
dead slow, but doable, maybe less worse than allowing it in hw. I really don't know what the tradeoffs are here
@sima @Nulhomme @gfxstrand @Sobex are we talking copying the memory pages, holding the folio lock while we update the page tables. Or holding the locks while doing the writes? Honestly hard for me to tell what would be cheaper, probably would have to test both.
@sima @Nulhomme @gfxstrand @Sobex assuming we can update the page tables atomicaly. Copying the pages should have no impact on a read only thread, at the cost of copying at most 8k assuming small pages. I don't know how we would deal with writers, other then let the last writer win and ignore all writers in progress.
@ekg @Nulhomme @gfxstrand @Sobex you don't need to copy the entire page, just emulate the atomic instruction after having removed all userspace pagetable entries, to make sure the access is atomic
the expensive part is removing the pagetable entries and invalidating tlbs
@sima @ekg @Nulhomme @gfxstrand Hmm, if another thread attempts to read while the pageable is removed, what should happen ?
That sounds very ugly to handle.
@Sobex @sima @Nulhomme @gfxstrand I assume it would fault.
@Sobex @sima @Nulhomme @gfxstrand I know that the kernel under some circumstances will map and unmap pages on context boundaries. Obviously when switching between user space threads, but also when using kernel space isolation. Which was proposed for dealing with Meltdown and similar.
@Sobex @sima @Nulhomme @gfxstrand don't take me wrong i am happy to bullshit my way to an answer. But I think I will defer answering to someone that actually knows.
@ekg @sima @Nulhomme @gfxstrand The thing I know is that we had a some rather global locks on each process page table in our 15-410 kernel (or perhaps on the entire virtual memory subsystem), and that such an approach is probably too heavy handed for linux.
@Sobex Linux had a big beautiful lock at some point, but I think that is dead and buried for ten years now.
@ekg @sima @Nulhomme @gfxstrand That would be different from the Meltdown patch though. (Meltdown patch switches the page table between kernel mode and usermode, but this is purely per thread / core)