was missing an rts on update_palette
this happened to fall through to keycheck
which if timing was wrong would dutifully process the viewport
change and return to update_palette's caller
which in turn was -not- expecting to reset the outer loop
fixed
Uses the "big multiplication table" in 64KB of extended memory if
bank switching appears to work, otherwise uses the table of squares
lookups.
Initial view clocks in at 13.133 ms/px for the XE version and still
14.211 ms/px for the 400/800/XL version.
Tested in emulator with 130XE and XL+Ultimate 1MB upgrade configs,
and base implementation on the 800XL emulator.
planning to try a 64KB table of 8x7-bit multiplies in the high memory
on a 130XE or other high-memory-capable machine
not yet working or finished
too many cycles of overhead per invocation
previously we were flipping the inputs if negative, and then the
output if both inputs were negative
turns out you can just treat the whole thing as an unsigned mul
and then subtract each term from the high word if the other term
is negative.
https://stackoverflow.com/a/28827013
this saves a handful of cycles, reducing our runtime to a merge
14.211 ms/px \o/
Improves runtime from 16.24 ms/px to 14.44 ms/px
This uses a routine found on Everything2:
https://everything2.com/title/Fast+6502+multiplication
which uses a lookup table of squares to do 8-bit imuls,
which are then composed into a 16-bit imul