From efd722eeb2c100fcf31d6ed19e9f611e4ac6c949 Mon Sep 17 00:00:00 2001 From: Brion Vibber Date: Wed, 4 Jan 2023 20:37:16 -0800 Subject: [PATCH] update cycle count for imul16 --- mandel.s | 10 +++++----- readme.md | 2 +- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/mandel.s b/mandel.s index 835a0c3..46ad664 100644 --- a/mandel.s +++ b/mandel.s @@ -92,8 +92,8 @@ next: positive: .endmacro -; min 454 cycles -; max 756 cycles +; min 470 cycles +; max 780 cycles .proc imul16 arg1 = FR0 ; 16-bit arg (clobbered) arg2 = FR1 ; 16-bit arg (clobbered) @@ -112,10 +112,10 @@ positive: ; unrolled loop for maximum speed, at the cost ; of a larger routine - ; 424 to 672 cycles + ; 440 to 696 cycles .repeat 16, bitnum - ; first half: 22 to 40 cycles - ; second half: 29 to 47 cycles + ; bitnum < 8: 25 or 41 cycles + ; bitnum >= 8: 30 or 46 cycles bitmul16 arg1, arg2, result, bitnum .endrepeat diff --git a/readme.md b/readme.md index 4d4e08d..f2f84bc 100644 --- a/readme.md +++ b/readme.md @@ -19,7 +19,7 @@ Enjoy! I'll probably work on this off and on for the next few weeks until I've g The 16-bit signed integer multiplication seems to be working, though I need to double-check it some more. It takes two 16-bit inputs and emits one 32-bit output in the zero page, using the Atari OS ROM's floating point registers as workspaces. Inputs are clobbered. -The main loop is a basic add-and-shift, using 16-bit adds which requires flipping the sign of negative inputs (otherwise you'd have to add all those sign-extension bits). Runs in circa 450-750 cycles depending on input (I'll run the exact numbers again later). +The main loop is a basic add-and-shift, using 16-bit adds which requires flipping the sign of negative inputs (otherwise you'd have to add all those sign-extension bits). Runs in 480-780 cycles depending on input. The mandelbrot loop is partly sketched out but I have future updates to make on that.