@gfxstrand why would you do register allocation in ssa form?, I mean register allocation should be one of the last step, as it is very hardware dependent.
@nh @gfxstrand I imagine so. but handling phi nodes is certainly something that would be beyond me, and I would much rather strip the ssa form. Unless of course there existed spelled out compelling reasons to stay in ssa form.
@gfxstrand @ekg Yes. Though to be fair, I don't think you strictly need SSA form to stay within register pressure bounds.
@nh @ekg Maybe not. Standard graph coloring certainly doesn't have predictable pressure. Naive linear scan is predictable but suffers from inefficiencies in register utilization. The linear scan modifications I'm aware of that efficiently use the run register file and give predictable pressure (such as second-chance binpacking) are roughly equivalent to going into SSA as part of RA. I don't claim to have a particularly complete knowledge of the register allocation literature, though.
I'm increasingly of the opinion that SSA doesn't really matter. If you're willing to introduce parallel moves/split live ranges you can get good RA.
Just SSA is never good enough for actual constraints and if you use parallel moves for those your RA also needs to be able to insert "phis". Hence a lack of initial phis/SSA is only a minor inconvenience.
Which kinda brings us back to the tradeoff of reg pressure vs. number of moves/coalescing.
memory->memory phis are actually one example where you really want coalescing of the memory locations to work and is a great example of how the SSA splitting might cause more moves unless lots of care is taken.
I don't know what you had trouble with, but my guess for an easy implementation is just not considering the memory locations as SSA, lower the memory phis at the end of the spiller and then rerun the spiller to deal with introduced temporaries in your phi lowering.
@bas @nh @ekg I'm still working through the details of spilling and memory phis. Over-all, I don't think we fundamentally disagree, actually. Once you introduce shuffling and resolving across edges in your linear scan RA, it's basically the same algorithm at that point. It's just a matter of what you call things and how you handle range splitting. You can do it with or without SSA as the input and you have the same headache worth of problems either way. 🤷🏻♀️
@ekg @nh To sum up:
- Graph coloring is NP-hard
- Conventional spilling happens as a side-effect of graph coloring failing so you do the NP hard thing multiple times if you spill
- Graph coloring RA is well understand and in every compiler textbook
On the other hand...
- SSA-based RA is O(n).
- SSA-based RA is optimal and you know how many registers it needs up-front.
- Known pressure means you can spill before RA, also in linear(ish) time.
- SSA-based spilling in really frickin' hard