update cycle count for imul16
This commit is contained in:
parent
41915cf122
commit
efd722eeb2
2 changed files with 6 additions and 6 deletions
10
mandel.s
10
mandel.s
|
@ -92,8 +92,8 @@ next:
|
||||||
positive:
|
positive:
|
||||||
.endmacro
|
.endmacro
|
||||||
|
|
||||||
; min 454 cycles
|
; min 470 cycles
|
||||||
; max 756 cycles
|
; max 780 cycles
|
||||||
.proc imul16
|
.proc imul16
|
||||||
arg1 = FR0 ; 16-bit arg (clobbered)
|
arg1 = FR0 ; 16-bit arg (clobbered)
|
||||||
arg2 = FR1 ; 16-bit arg (clobbered)
|
arg2 = FR1 ; 16-bit arg (clobbered)
|
||||||
|
@ -112,10 +112,10 @@ positive:
|
||||||
|
|
||||||
; unrolled loop for maximum speed, at the cost
|
; unrolled loop for maximum speed, at the cost
|
||||||
; of a larger routine
|
; of a larger routine
|
||||||
; 424 to 672 cycles
|
; 440 to 696 cycles
|
||||||
.repeat 16, bitnum
|
.repeat 16, bitnum
|
||||||
; first half: 22 to 40 cycles
|
; bitnum < 8: 25 or 41 cycles
|
||||||
; second half: 29 to 47 cycles
|
; bitnum >= 8: 30 or 46 cycles
|
||||||
bitmul16 arg1, arg2, result, bitnum
|
bitmul16 arg1, arg2, result, bitnum
|
||||||
.endrepeat
|
.endrepeat
|
||||||
|
|
||||||
|
|
|
@ -19,7 +19,7 @@ Enjoy! I'll probably work on this off and on for the next few weeks until I've g
|
||||||
|
|
||||||
The 16-bit signed integer multiplication seems to be working, though I need to double-check it some more. It takes two 16-bit inputs and emits one 32-bit output in the zero page, using the Atari OS ROM's floating point registers as workspaces. Inputs are clobbered.
|
The 16-bit signed integer multiplication seems to be working, though I need to double-check it some more. It takes two 16-bit inputs and emits one 32-bit output in the zero page, using the Atari OS ROM's floating point registers as workspaces. Inputs are clobbered.
|
||||||
|
|
||||||
The main loop is a basic add-and-shift, using 16-bit adds which requires flipping the sign of negative inputs (otherwise you'd have to add all those sign-extension bits). Runs in circa 450-750 cycles depending on input (I'll run the exact numbers again later).
|
The main loop is a basic add-and-shift, using 16-bit adds which requires flipping the sign of negative inputs (otherwise you'd have to add all those sign-extension bits). Runs in 480-780 cycles depending on input.
|
||||||
|
|
||||||
The mandelbrot loop is partly sketched out but I have future updates to make on that.
|
The mandelbrot loop is partly sketched out but I have future updates to make on that.
|
||||||
|
|
||||||
|
|
Loading…
Reference in a new issue