Compare commits

...
Sign in to create a new pull request.

32 commits

Author SHA1 Message Date
6479cf530c update some timings 2025-09-16 21:56:50 -07:00
29cd3d968f Shaves 3 seconds off initial view runtime on XE :D
Instead of relying solely on the JMP thunks added to
imul16_func and sqr16_func, three call sites within the
mandelbrot iteration function are patched directly to
jsr to the XE versions, saving like 15 cycles per iter

Ok so it's not a lot, but every seconds counts. ;)

with XE code disabled:
1539 us/iter
5m13s

with old XE code:
1417 us/iter
4m48s

with new XE code:
1406 us/iter
4m45s
2025-09-06 19:53:25 -07:00
b46e6fb343 fix typo on stub x/y inputs
was accidentally falling through to the load
a viewport from a keypress thingy which was
not needed here
2025-09-01 12:28:33 -07:00
f2a6af0995 Replace the not-enough-precision 32 bit to float impl
keep the proc though to encapsulate it but uses the older
logic of rounding down to 3.13 first
2025-07-03 18:43:10 -07:00
96e0356e57 WIP input handling for coords
experimental output via 32-bits mult, looses precision in conversion
2025-07-03 18:41:24 -07:00
fab2760394 refactor countdown as a procedure call 2025-06-28 13:43:43 -07:00
fd954da47e Create map file for convenience
export a symbol and it'll appear in mandel.map
2025-06-23 08:17:39 -07:00
4bac47a4fd fix at 256 seconds 2025-06-23 00:31:53 -07:00
5cf64970c8 Ah that's better
used the appropriate instruction for comparison
2025-06-22 23:10:43 -07:00
Brooke
f7082ab371 wip subtraction method, still not working 2025-06-22 22:21:26 -07:00
Brooke
689363d083 WIP code for elapsed time
not finished, doesn't work right
2025-06-22 20:00:35 -07:00
89b4e45901 flip the y coordinate sign 2025-02-22 20:24:04 -08:00
6e66145ec6 whoops fixes 2025-02-22 15:37:11 -08:00
07db3d00d7 second status bar display with coords/zoom
currently using 3.13 precision to output to floats for formatting
2025-02-22 11:23:13 -08:00
26d612b6f3 move 8 scan lines on the bottom to status bar 2025-02-21 19:42:10 -08:00
25da81c64b clean up text draw, fix offset by one 2025-02-02 16:40:58 -08:00
d182d33b35 draw_string 2025-02-01 10:02:01 -08:00
e0cc704d99 Fix drawing terminator, round usec 2025-01-08 18:34:46 -08:00
7c04862d70 workaround for rounding us/iter
for some reason rounding is giving me wrong results
not sure what i'm doing wrong :D

just show 6 digits :P

ok this gets the us/iter working, and it is more stable
but the elapsed time still needs to be added
2025-01-05 14:29:27 -08:00
918d15e813 wip us/iter counter
seems wrong, gives 32 all the time and that seems too small
2025-01-05 14:05:24 -08:00
eaa00a055a wip changing time units
it does this weird thing where sometimes it's reading out wrong digits
and then switches to expected unit of sec/px

work in progress no clue what's going on
2025-01-04 18:46:51 -08:00
7e5ca79d9a move total_ms, total_pixels out of zero page
this frees up 12 bytes of zero page space and costs no measurable
time as these variables are not in the hot path and there was only
a tiny bit different.
2025-01-04 14:25:25 -08:00
d2bf77dc26 todo notes 2025-01-04 12:13:27 -08:00
582ddf497f apply jamey's suggestion of skipping add for high byte muls
rather than saving 0 into the high bytes, then adding the high-byte
multiplication later, write it directly in place. this saves a few
cycles on every iteration, and it adds up nicely.

View 1 overview render times:
130XE: 10.050 ms/px - 4m56s
800XL: 10.906 ms/px - 5m21s
2025-01-04 10:53:51 -08:00
d157fe1306 Faster pixel skipping on 4x4, 2x2 tiers
Iterate at fill_masks[fill_level]+1 instead of every pixel and then
skipping, saves a smidge of time

view 1 with expanded memory:
10.514 ms/px before
10.430 ms/px after
2025-01-04 10:06:12 -08:00
dcf5a3f59e sixth viewport 2025-01-01 21:15:38 -08:00
837082cf56 tweak viewports
skip experimental 6th viewport that got forgotten
and limit max zoom to 7 (range 0-7) which is what looks good
2025-01-01 15:45:26 -08:00
65fcb44934 3.13 / 6.26 gives nicer results! 2025-01-01 15:37:12 -08:00
c424f1b8bc fill in scanlines during tiering 2024-12-31 22:10:27 -08:00
49fe315529 'wide pixels'
should get better color on the composite video because the
scanlines will be fuller of data
2024-12-31 20:13:11 -08:00
f1ebb21bcb wip not working wide pixels 2024-12-31 17:49:13 -08:00
87caa52543 add viewport number 5 full zoom 2024-12-31 15:45:03 -08:00
4 changed files with 691 additions and 256 deletions

View file

@ -3,7 +3,7 @@
all : mandel.xex all : mandel.xex
mandel.xex : mandel.o tables.o atari-asm-xex.cfg mandel.xex : mandel.o tables.o atari-asm-xex.cfg
ld65 -C ./atari-asm-xex.cfg -o $@ mandel.o tables.o ld65 -C ./atari-asm-xex.cfg --mapfile mandel.map -o $@ mandel.o tables.o
%.o : %.s %.o : %.s
ca65 -o $@ $< ca65 -o $@ $<
@ -15,4 +15,6 @@ clean :
rm -f tables.s rm -f tables.s
rm -f *.o rm -f *.o
rm -f *.xex rm -f *.xex
rm -f mandel.map

925
mandel.s

File diff suppressed because it is too large Load diff

View file

@ -18,7 +18,7 @@ Enjoy! I'll probably work on this off and on for the next few weeks until I've g
## Current state ## Current state
Basic rendering is functional, with interactive zoom/pan (+/-/arrows) and 4 preset viewports via the number keys. Basic rendering is functional, with interactive zoom/pan (+/-/arrows) and 6 preset viewports via the number keys.
The 16-bit signed integer multiplication takes two 16-bit inputs and emits one 32-bit output in the zero page, using the Atari OS ROM's floating point registers as workspaces. Inputs are clobbered. The 16-bit signed integer multiplication takes two 16-bit inputs and emits one 32-bit output in the zero page, using the Atari OS ROM's floating point registers as workspaces. Inputs are clobbered.
@ -27,7 +27,7 @@ The 16-bit signed integer multiplication takes two 16-bit inputs and emits one 3
* when expanded RAM is available as on 130XE, a 64KB 8-bit multiplication table accelerates the remaining multiplications * when expanded RAM is available as on 130XE, a 64KB 8-bit multiplication table accelerates the remaining multiplications
* without expanded RAM, a table of half-squares is used to implement the algorithm from https://everything2.com/title/Fast+6502+multiplication * without expanded RAM, a table of half-squares is used to implement the algorithm from https://everything2.com/title/Fast+6502+multiplication
The mandelbrot calculations are done using 4.12-precision fixed point numbers with 8.24-precision intermediates. It may be possible to squish this down to 3.13/6.26. The mandelbrot calculations are done using 3.13-precision fixed point numbers with 6.26-precision intermediates.
Iterations are capped at 255. Iterations are capped at 255.

14
todo.md
View file

@ -1,19 +1,17 @@
things to try: things to try:
* skip add on the top-byte multiply in sqr8/mul8 * fix status bar to show elapsed time, per-iter time, per-pixel iter count
* should save a few cycles, suggestion by jamey
* 'turbo' mode disabling graphics in full or part
* patch the entire expanded-ram imul8xe on top of imul8 to avoid the 3-cycle thunk penalty :D * patch the entire expanded-ram imul8xe on top of imul8 to avoid the 3-cycle thunk penalty :D
* try 3.13 fixed point instead of 4.12 for more precision * maybe clean up the load/layout of the big mul table
* can we get away without the extra bit?
* since exit compare space would be 6.26 i think so * consider alternate lookup tables in the top 16KB under ROM
* y-axis mirror optimization * y-axis mirror optimization
* 'wide pixels' 2x and 4x for a fuller initial image in the tiered rendering
* maybe redo tiering to just 4x4, 2x2, 1x1?
* extract viewport for display & re-input via keyboard * extract viewport for display & re-input via keyboard
* fujinet screenshot/viewport uploader * fujinet screenshot/viewport uploader