Note how many cycles loop unrolling saves

+4 cycles one-time setup

ldx #8 ; 2 cyc for first 8
ldx #8 ; 2 cyc for second 8 (different shift behavior)

-5 cycles/iter to get bit now

lsr arg + 1 ; 5 cyc
rol arg     ; 5 cyc

+10 cycles/iter to get the bit in a loop

dex ; 2 cyc
bne ; 2 cyc

4 cycles/iter for the loop

4 + (14 * 16) = 4 + 224 = 228
This commit is contained in:
Brooke Vibber 2023-01-04 21:02:18 -08:00
parent 519f8ad635
commit 1408855eed

View file

@ -21,6 +21,8 @@ The 16-bit signed integer multiplication seems to be working, though I need to d
The main loop is a basic add-and-shift, using 16-bit adds which requires flipping the sign of negative inputs (otherwise you'd have to add all those sign-extension bits). Runs in 470-780 cycles depending on input.
The loop is unrolled which saves 228 cycles, but at the cost of making the routine quite large. This is an acceptable tradeoff for the Mandelbrot, where imul16 is the dominant performance cost and the rest of the program will be small.
The mandelbrot loop is partly sketched out but I have future updates to make on that.
I've also sketched out a 16-bit rounding macro, which is not yet committed.