things to try:

* add some preset viewports that can be switched via number keys (1, 2, 3 etc)

* patch the entire expanded-ram imul8xe on top of imul8 to avoid the 3-cycle thunk penalty :D

* square-root special case of multiplication for zx*zx and zy*zy
  * the hi1*hi2 and lo1*lo2 8-bit muls can be optimized into a 512-byte lookup table
  * jamey on mastodon tried this but had some problems. see what happens on our version!

* double-check rounding behavior is correct

* try 3.13 fixed point instead of 4.12 for more precision
  * can we get away without the extra bit?

* y-axis mirror optimization

* 'wide pixels' 2x and 4x for a fuller initial image in the tiered rendering

* rework the palette cycling to look more like an advancing flow

* extact viewport for display & re-input via keyboard

* fujinet screenshot/viewport uploader