From 1408855eed2b34f0b2062badd402ba6b09eaf4a9 Mon Sep 17 00:00:00 2001 From: Brion Vibber Date: Wed, 4 Jan 2023 21:02:18 -0800 Subject: [PATCH] Note how many cycles loop unrolling saves +4 cycles one-time setup ldx #8 ; 2 cyc for first 8 ldx #8 ; 2 cyc for second 8 (different shift behavior) -5 cycles/iter to get bit now lsr arg + 1 ; 5 cyc rol arg ; 5 cyc +10 cycles/iter to get the bit in a loop dex ; 2 cyc bne ; 2 cyc 4 cycles/iter for the loop 4 + (14 * 16) = 4 + 224 = 228 --- readme.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/readme.md b/readme.md index a69c627..1929d9f 100644 --- a/readme.md +++ b/readme.md @@ -21,6 +21,8 @@ The 16-bit signed integer multiplication seems to be working, though I need to d The main loop is a basic add-and-shift, using 16-bit adds which requires flipping the sign of negative inputs (otherwise you'd have to add all those sign-extension bits). Runs in 470-780 cycles depending on input. +The loop is unrolled which saves 228 cycles, but at the cost of making the routine quite large. This is an acceptable tradeoff for the Mandelbrot, where imul16 is the dominant performance cost and the rest of the program will be small. + The mandelbrot loop is partly sketched out but I have future updates to make on that. I've also sketched out a 16-bit rounding macro, which is not yet committed.