things to try:

* fix status bar to show elapsed time, per-iter time, per-pixel iter count

* 'turbo' mode disabling graphics in full or part

* patch the entire expanded-ram imul8xe on top of imul8 to avoid the 3-cycle thunk penalty :D

* maybe clean up the load/layout of the big mul table

* consider alternate lookup tables in the top 16KB under ROM

* y-axis mirror optimization

* extract viewport for display & re-input via keyboard

* fujinet screenshot/viewport uploader