it does this weird thing where sometimes it's reading out wrong digits
and then switches to expected unit of sec/px
work in progress no clue what's going on
this frees up 12 bytes of zero page space and costs no measurable
time as these variables are not in the hot path and there was only
a tiny bit different.
rather than saving 0 into the high bytes, then adding the high-byte
multiplication later, write it directly in place. this saves a few
cycles on every iteration, and it adds up nicely.
View 1 overview render times:
130XE: 10.050 ms/px - 4m56s
800XL: 10.906 ms/px - 5m21s
Iterate at fill_masks[fill_level]+1 instead of every pixel and then
skipping, saves a smidge of time
view 1 with expanded memory:
10.514 ms/px before
10.430 ms/px after
was missing an rts on update_palette
this happened to fall through to keycheck
which if timing was wrong would dutifully process the viewport
change and return to update_palette's caller
which in turn was -not- expecting to reset the outer loop
fixed
Uses the "big multiplication table" in 64KB of extended memory if
bank switching appears to work, otherwise uses the table of squares
lookups.
Initial view clocks in at 13.133 ms/px for the XE version and still
14.211 ms/px for the 400/800/XL version.
Tested in emulator with 130XE and XL+Ultimate 1MB upgrade configs,
and base implementation on the 800XL emulator.