things to try: * patch the entire expanded-ram imul8xe on top of imul8 to avoid the 3-cycle thunk penalty :D * try 3.13 fixed point instead of 4.12 for more precision * can we get away without the extra bit? * y-axis mirror optimization * 'wide pixels' 2x and 4x for a fuller initial image in the tiered rendering * maybe redo tiering to just 4x4, 2x2, 1x1? * extract viewport for display & re-input via keyboard * fujinet screenshot/viewport uploader