update readme & doc comments & vars

This commit is contained in:
Brooke Vibber 2023-01-22 14:34:30 -08:00
parent 7009e16235
commit b6ddc0d50e
2 changed files with 16 additions and 14 deletions

View file

@ -3,15 +3,15 @@ sx = $80 ; i16: screen pixel x
sy = $82 ; i16: screen pixel y sy = $82 ; i16: screen pixel y
ox = $84 ; fixed4.12: center point x ox = $84 ; fixed4.12: center point x
oy = $86 ; fixed4.12: center point y oy = $86 ; fixed4.12: center point y
cx = $84 ; fixed4.12: c_x cx = $88 ; fixed4.12: c_x
cy = $86 ; fixed4.12: c_y cy = $8a ; fixed4.12: c_y
zx = $88 ; fixed4.12: z_x zx = $8c ; fixed4.12: z_x
zy = $8a ; fixed4.12: z_y zy = $8e ; fixed4.12: z_y
zx_2 = $90 ; fixed8.24: z_x^2 zx_2 = $90 ; fixed4.12: z_x^2
zy_2 = $94 ; fixed8.24: z_y^2 zy_2 = $92 ; fixed4.12: z_y^2
zx_zy = $98 ; fixed8.24: z_x * z_y zx_zy = $94 ; fixed4.12: z_x * z_y
dist = $9c ; fixed8.24: z_x^2 + z_y^2 dist = $96 ; fixed4.12: z_x^2 + z_y^2
iter = $a0 ; u8: iteration count iter = $a0 ; u8: iteration count
zoom = $a1 ; u8: zoom shift level zoom = $a1 ; u8: zoom shift level

View file

@ -18,21 +18,23 @@ Enjoy! I'll probably work on this off and on for the next few weeks until I've g
## Current state ## Current state
The 16-bit signed integer multiplication seems to be working, though I need to double-check it some more. It takes two 16-bit inputs and emits one 32-bit output in the zero page, using the Atari OS ROM's floating point registers as workspaces. Inputs are clobbered. Basic rendering is functional, but no interactive behavior (zoom/pan) or benchmarking is done yet.
The 16-bit signed integer multiplication works; it takes two 16-bit inputs and emits one 32-bit output in the zero page, using the Atari OS ROM's floating point registers as workspaces. Inputs are clobbered.
The main loop is a basic add-and-shift, using 16-bit adds which requires flipping the sign of negative inputs (otherwise you'd have to add all those sign-extension bits). Runs in 470-780 cycles depending on input. The main loop is a basic add-and-shift, using 16-bit adds which requires flipping the sign of negative inputs (otherwise you'd have to add all those sign-extension bits). Runs in 470-780 cycles depending on input.
The loop is unrolled which saves 148 cycles, but at the cost of making the routine quite large. This is an acceptable tradeoff for the Mandelbrot, where imul16 is the dominant performance cost and the rest of the program will be small. The mandelbrot calculations are done using 4.12-precision fixed point numbers. It may be possible to squish this down to 3.13.
The mandelbrot loop is now written out, but untested and probably buggy. With three multiplications, several additions/subtractions, and three sets of annoying bit shifts and rounds, it weighs in at 1939 - 3007 cycles per iteration. Iterations are capped at 255.
## Next steps ## Next steps
After a quick once-over to make sure it looks right, it's probably time to slap a display list together and draw some pixels to the screen and see what happens. Add a running counter of ms/px using the vertical blank interrupts as a timer. This'll show how further work improves it!
Reaching max iterations (256 runs through the loop) will take a half second or so per pixel -- this can be optimized by keeping a buffer of a few past zx/zy values and checking for duplicates which would signal a loop that will never escape. (Another technique I learned from Fractint!) Check for cycles in (zx,zy) output when in the 'lake'; if values repeat, they cannot escape. This is a big time saver in fractint.
160x192 is luckily only 30,720 pixels, so there's a hard rendering time limit of about 4.5 hours. :D I may be able to do a faster multiply using tables of squares for 8-bit component multiplication.
## Deps and build instructions ## Deps and build instructions