@lenary @erisceleste @eniko yeah, though even for a contemporary x86 uarch i think it's better/not worse to `sub rsp` in the prolog and poke arguments into the frame (which is how gcc/clang codegen frames) unless you really super care about code size over speed

@joe @lenary @erisceleste @eniko there’s a stack address predictor that makes it no better, in general, but yeah, it’s also not worse.

@steve @lenary @erisceleste @eniko i was also curious how much "shadow stack" load/store forwarding makes a difference, if at all

Follow

@joe @steve from a performance perspective the only thing that really should matter is predictability. Morden super scaler CPUs will break apart the instruction flow rename all the registers, and memory locations, schedule the micro ops and retire them. Assuming no resource is over subscribed the one thing that limit through put is the branch predicter, on most morden CPUs that is one taken branch per cycle with a misprediction penalty of 15 cycles.

· Librem Social · 1 · 0 · 1

@joe @steve I did get some of the conversation cut off, so I assumed that the discussion was about function calls. A huge limiter of performance I disregarded as irrelevant is deserialisation which is impossible with back to back dependent instructions. The numbers of instruction that can be executed in parallel is largely irrelevant if the scheduler is constantly waiting for the last instruction to finish.

Sign in to participate in the conversation
Librem Social

Librem Social is an opt-in public network. Messages are shared under Creative Commons BY-SA 4.0 license terms. Policy.

Stay safe. Please abide by our code of conduct.

(Source code)

image/svg+xml Librem Chat image/svg+xml