Compare commits
No commits in common. "ed79c80b167607f0c59d7c8f33569f9bf3e981f5" and "6db8cef82d4117ae2b3ede21e9ed3cf1ab720a22" have entirely different histories.
ed79c80b16
...
6db8cef82d
2 changed files with 12 additions and 15 deletions
24
readme.md
24
readme.md
|
|
@ -14,18 +14,15 @@ Non-goals:
|
||||||
|
|
||||||
Enjoy! I'll probably work on this off and on for the next few weeks until I've got it producing fractals.
|
Enjoy! I'll probably work on this off and on for the next few weeks until I've got it producing fractals.
|
||||||
|
|
||||||
-- brooke, january 2023 - december 2024
|
-- brooke, january 2023 - february 2024
|
||||||
|
|
||||||
## Current state
|
## Current state
|
||||||
|
|
||||||
Basic rendering is functional, with interactive zoom/pan (+/-/arrows) and 4 preset viewports via the number keys.
|
Basic rendering is functional, but no interactive behavior (zoom/pan) or benchmarking is done yet.
|
||||||
|
|
||||||
The 16-bit signed integer multiplication takes two 16-bit inputs and emits one 32-bit output in the zero page, using the Atari OS ROM's floating point registers as workspaces. Inputs are clobbered.
|
The 16-bit signed integer multiplication works; it takes two 16-bit inputs and emits one 32-bit output in the zero page, using the Atari OS ROM's floating point registers as workspaces. Inputs are clobbered.
|
||||||
|
|
||||||
* 16-bit multiplies are decomposed into 4 8-bit unsigned multiplies and some addition
|
The main loop is a basic add-and-shift, using 16-bit adds which requires flipping the sign of negative inputs (otherwise you'd have to add all those sign-extension bits). Runs in 470-780 cycles depending on input.
|
||||||
* an optimized case for squares uses a table of 8-bit squares to reduce the number of 8-bit multiplication sub-ops
|
|
||||||
* when expanded RAM is available as on 130XE, a 64KB 8-bit multiplication table accelerates the remaining multiplications
|
|
||||||
* without expanded RAM, a table of half-squares is used to implement the algorithm from https://everything2.com/title/Fast+6502+multiplication
|
|
||||||
|
|
||||||
The mandelbrot calculations are done using 4.12-precision fixed point numbers. It may be possible to squish this down to 3.13.
|
The mandelbrot calculations are done using 4.12-precision fixed point numbers. It may be possible to squish this down to 3.13.
|
||||||
|
|
||||||
|
|
@ -33,18 +30,17 @@ Iterations are capped at 255.
|
||||||
|
|
||||||
The pixels are run in a progressive layout to get the basic shape on screen faster.
|
The pixels are run in a progressive layout to get the basic shape on screen faster.
|
||||||
|
|
||||||
There is a running counter of ms/px using the vertical blank interrupts as a timer, used to track our progress. :D
|
## Next steps
|
||||||
|
|
||||||
There's a check for cycles in (zx,zy) output when in the 'lake'; if values repeat, they cannot escape. This is a big time saver in fractint.
|
Add a running counter of ms/px using the vertical blank interrupts as a timer. This'll show how further work improves it!
|
||||||
|
|
||||||
There's some cute color cycling.
|
Check for cycles in (zx,zy) output when in the 'lake'; if values repeat, they cannot escape. This is a big time saver in fractint.
|
||||||
|
|
||||||
|
I may be able to do a faster multiply using tables of squares for 8-bit component multiplication.
|
||||||
|
(done)
|
||||||
|
|
||||||
## Deps and build instructions
|
## Deps and build instructions
|
||||||
|
|
||||||
I'm using `ca65` as a macro assembler, and have a Unix-style `Makefile` for building. Should work fairly easily on Linux and Mac. Might work on "raw" Windows but I use WSL for that.
|
I'm using `ca65` as a macro assembler, and have a Unix-style `Makefile` for building. Should work fairly easily on Linux and Mac. Might work on "raw" Windows but I use WSL for that.
|
||||||
|
|
||||||
Currently produces a `.xex` executable, which can be booted up in common Atari emulators and some i/o devices.
|
Currently produces a `.xex` executable, which can be booted up in common Atari emulators and some i/o devices.
|
||||||
|
|
||||||
## Todo
|
|
||||||
|
|
||||||
See ideas in `todo.md`.
|
|
||||||
3
todo.md
3
todo.md
|
|
@ -2,13 +2,14 @@ things to try:
|
||||||
|
|
||||||
* patch the entire expanded-ram imul8xe on top of imul8 to avoid the 3-cycle thunk penalty :D
|
* patch the entire expanded-ram imul8xe on top of imul8 to avoid the 3-cycle thunk penalty :D
|
||||||
|
|
||||||
|
* optimize out a store/load with mul8_add16 and mul8_add24
|
||||||
|
|
||||||
* try 3.13 fixed point instead of 4.12 for more precision
|
* try 3.13 fixed point instead of 4.12 for more precision
|
||||||
* can we get away without the extra bit?
|
* can we get away without the extra bit?
|
||||||
|
|
||||||
* y-axis mirror optimization
|
* y-axis mirror optimization
|
||||||
|
|
||||||
* 'wide pixels' 2x and 4x for a fuller initial image in the tiered rendering
|
* 'wide pixels' 2x and 4x for a fuller initial image in the tiered rendering
|
||||||
* maybe redo tiering to just 4x4, 2x2, 1x1?
|
|
||||||
|
|
||||||
* extract viewport for display & re-input via keyboard
|
* extract viewport for display & re-input via keyboard
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue