update readme & doc comments & vars

2023-01-22 14:34:30 -08:00 · 2023-01-22 14:34:30 -08:00 · b6ddc0d50e
commit b6ddc0d50e
parent 7009e16235
2 changed files with 17 additions and 15 deletions
--- a/mandel.s
+++ b/mandel.s
@ -3,15 +3,15 @@ sx    = $80     ; i16: screen pixel x
 sy    = $82     ; i16: screen pixel y
 ox    = $84     ; fixed4.12: center point x
 oy    = $86     ; fixed4.12: center point y
-cx    = $84     ; fixed4.12: c_x
+cx    = $88     ; fixed4.12: c_x
-cy    = $86     ; fixed4.12: c_y
+cy    = $8a     ; fixed4.12: c_y
-zx    = $88     ; fixed4.12: z_x
+zx    = $8c     ; fixed4.12: z_x
-zy    = $8a     ; fixed4.12: z_y
+zy    = $8e     ; fixed4.12: z_y
-zx_2  = $90     ; fixed8.24: z_x^2
+zx_2  = $90     ; fixed4.12: z_x^2
-zy_2  = $94     ; fixed8.24: z_y^2
+zy_2  = $92     ; fixed4.12: z_y^2
-zx_zy = $98     ; fixed8.24: z_x * z_y
+zx_zy = $94     ; fixed4.12: z_x * z_y
-dist  = $9c     ; fixed8.24: z_x^2 + z_y^2
+dist  = $96     ; fixed4.12: z_x^2 + z_y^2
 iter  = $a0     ; u8: iteration count
 zoom  = $a1     ; u8: zoom shift level
--- a/readme.md
+++ b/readme.md
@ -18,21 +18,23 @@ Enjoy! I'll probably work on this off and on for the next few weeks until I've g
 ## Current state
-The 16-bit signed integer multiplication seems to be working, though I need to double-check it some more. It takes two 16-bit inputs and emits one 32-bit output in the zero page, using the Atari OS ROM's floating point registers as workspaces. Inputs are clobbered.
+Basic rendering is functional, but no interactive behavior (zoom/pan) or benchmarking is done yet.
 The 16-bit signed integer multiplication works; it takes two 16-bit inputs and emits one 32-bit output in the zero page, using the Atari OS ROM's floating point registers as workspaces. Inputs are clobbered.
 The main loop is a basic add-and-shift, using 16-bit adds which requires flipping the sign of negative inputs (otherwise you'd have to add all those sign-extension bits). Runs in 470-780 cycles depending on input.
-The loop is unrolled which saves 148 cycles, but at the cost of making the routine quite large. This is an acceptable tradeoff for the Mandelbrot, where imul16 is the dominant performance cost and the rest of the program will be small.
+The mandelbrot calculations are done using 4.12-precision fixed point numbers. It may be possible to squish this down to 3.13.
-The mandelbrot loop is now written out, but untested and probably buggy. With three multiplications, several additions/subtractions, and three sets of annoying bit shifts and rounds, it weighs in at 1939 - 3007 cycles per iteration.
+Iterations are capped at 255.
 ## Next steps
-After a quick once-over to make sure it looks right, it's probably time to slap a display list together and draw some pixels to the screen and see what happens.
+Add a running counter of ms/px using the vertical blank interrupts as a timer. This'll show how further work improves it!
-Reaching max iterations (256 runs through the loop) will take a half second or so per pixel -- this can be optimized by keeping a buffer of a few past zx/zy values and checking for duplicates which would signal a loop that will never escape. (Another technique I learned from Fractint!)
+Check for cycles in (zx,zy) output when in the 'lake'; if values repeat, they cannot escape. This is a big time saver in fractint.
-160x192 is luckily only 30,720 pixels, so there's a hard rendering time limit of about 4.5 hours. :D
+I may be able to do a faster multiply using tables of squares for 8-bit component multiplication.
 ## Deps and build instructions