things to try:
* patch the entire expanded-ram imul8xe on top of imul8 to avoid the 3-cycle thunk penalty :D
* y-axis mirror optimization
* extract viewport for display & re-input via keyboard
* fujinet screenshot/viewport uploader