Compare commits

..

2 commits

Author SHA1 Message Date
ed79c80b16 update readme 2024-12-30 16:50:25 -08:00
e6cbe0bc6b notes 2024-12-30 16:43:18 -08:00
2 changed files with 15 additions and 12 deletions

View file

@ -14,15 +14,18 @@ Non-goals:
Enjoy! I'll probably work on this off and on for the next few weeks until I've got it producing fractals. Enjoy! I'll probably work on this off and on for the next few weeks until I've got it producing fractals.
-- brooke, january 2023 - february 2024 -- brooke, january 2023 - december 2024
## Current state ## Current state
Basic rendering is functional, but no interactive behavior (zoom/pan) or benchmarking is done yet. Basic rendering is functional, with interactive zoom/pan (+/-/arrows) and 4 preset viewports via the number keys.
The 16-bit signed integer multiplication works; it takes two 16-bit inputs and emits one 32-bit output in the zero page, using the Atari OS ROM's floating point registers as workspaces. Inputs are clobbered. The 16-bit signed integer multiplication takes two 16-bit inputs and emits one 32-bit output in the zero page, using the Atari OS ROM's floating point registers as workspaces. Inputs are clobbered.
The main loop is a basic add-and-shift, using 16-bit adds which requires flipping the sign of negative inputs (otherwise you'd have to add all those sign-extension bits). Runs in 470-780 cycles depending on input. * 16-bit multiplies are decomposed into 4 8-bit unsigned multiplies and some addition
* an optimized case for squares uses a table of 8-bit squares to reduce the number of 8-bit multiplication sub-ops
* when expanded RAM is available as on 130XE, a 64KB 8-bit multiplication table accelerates the remaining multiplications
* without expanded RAM, a table of half-squares is used to implement the algorithm from https://everything2.com/title/Fast+6502+multiplication
The mandelbrot calculations are done using 4.12-precision fixed point numbers. It may be possible to squish this down to 3.13. The mandelbrot calculations are done using 4.12-precision fixed point numbers. It may be possible to squish this down to 3.13.
@ -30,17 +33,18 @@ Iterations are capped at 255.
The pixels are run in a progressive layout to get the basic shape on screen faster. The pixels are run in a progressive layout to get the basic shape on screen faster.
## Next steps There is a running counter of ms/px using the vertical blank interrupts as a timer, used to track our progress. :D
Add a running counter of ms/px using the vertical blank interrupts as a timer. This'll show how further work improves it! There's a check for cycles in (zx,zy) output when in the 'lake'; if values repeat, they cannot escape. This is a big time saver in fractint.
Check for cycles in (zx,zy) output when in the 'lake'; if values repeat, they cannot escape. This is a big time saver in fractint. There's some cute color cycling.
I may be able to do a faster multiply using tables of squares for 8-bit component multiplication.
(done)
## Deps and build instructions ## Deps and build instructions
I'm using `ca65` as a macro assembler, and have a Unix-style `Makefile` for building. Should work fairly easily on Linux and Mac. Might work on "raw" Windows but I use WSL for that. I'm using `ca65` as a macro assembler, and have a Unix-style `Makefile` for building. Should work fairly easily on Linux and Mac. Might work on "raw" Windows but I use WSL for that.
Currently produces a `.xex` executable, which can be booted up in common Atari emulators and some i/o devices. Currently produces a `.xex` executable, which can be booted up in common Atari emulators and some i/o devices.
## Todo
See ideas in `todo.md`.

View file

@ -2,14 +2,13 @@ things to try:
* patch the entire expanded-ram imul8xe on top of imul8 to avoid the 3-cycle thunk penalty :D * patch the entire expanded-ram imul8xe on top of imul8 to avoid the 3-cycle thunk penalty :D
* optimize out a store/load with mul8_add16 and mul8_add24
* try 3.13 fixed point instead of 4.12 for more precision * try 3.13 fixed point instead of 4.12 for more precision
* can we get away without the extra bit? * can we get away without the extra bit?
* y-axis mirror optimization * y-axis mirror optimization
* 'wide pixels' 2x and 4x for a fuller initial image in the tiered rendering * 'wide pixels' 2x and 4x for a fuller initial image in the tiered rendering
* maybe redo tiering to just 4x4, 2x2, 1x1?
* extract viewport for display & re-input via keyboard * extract viewport for display & re-input via keyboard