Fix drawing terminator, round usec

workaround for rounding us/iter
for some reason rounding is giving me wrong results not sure what i'm doing wrong :D just show 6 digits :P ok this gets the us/iter working, and it is more stable but the elapsed time still needs to be added
2025-01-08 18:34:46 -08:00 · 2025-01-05 14:29:27 -08:00 · 2025-01-05 14:05:24 -08:00 · 2025-01-04 18:46:51 -08:00 · 2025-01-04 14:25:25 -08:00 · 2025-01-04 12:13:27 -08:00
7 changed files with 1257 additions and 380 deletions
--- a/.mailmap
+++ b/.mailmap
@ -0,0 +1,2 @@
 Brooke Vibber <bvibber@pobox.com>
 Brooke Vibber <bvibber@pobox.com> <brion@pobox.com>
--- a/4
+++ b/4
@ -2,8 +2,8 @@
 all : mandel.xex
-mandel.xex : mandel.o tables.o
+mandel.xex : mandel.o tables.o atari-asm-xex.cfg
-	ld65 -C ./atari-asm-xex.cfg -o $@ $+
+	ld65 -C ./atari-asm-xex.cfg -o $@ mandel.o tables.o
 %.o : %.s
 	ca65 -o $@ $<
--- a/atari-asm-xex.cfg
+++ b/atari-asm-xex.cfg
@ -0,0 +1,28 @@
 FEATURES {
    STARTADDRESS: default = $2E00;
 }
 SYMBOLS {
    __STARTADDRESS__: type = export, value = %S;
 }
 MEMORY {
    ZP:      file = "", define = yes, start = $0082, size = $007E;
    MAIN:    file = %O, define = yes, start = %S,    size = $4000 - %S;
    # Keep $4000-7fff clear for expanded RAM access window
    TABLES:  file = %O, define = yes, start = $8000, size = $a000 - $8000;
    # Keep $a000-$bfff clear for BASIC cartridge
 }
 FILES {
    %O: format = atari;
 }
 FORMATS {
    atari: runad = start;
 }
 SEGMENTS {
    ZEROPAGE: load = ZP,      type = zp,  optional = yes;
    EXTZP:    load = ZP,      type = zp,  optional = yes; # to enable modules to be able to link to C and assembler programs
    CODE:     load = MAIN,    type = rw,                  define = yes;
    RODATA:   load = MAIN,    type = ro   optional = yes;
    DATA:     load = MAIN,    type = rw   optional = yes;
    BSS:      load = MAIN,    type = bss, optional = yes, define = yes;
    TABLES:   load = TABLES,  type = ro,  optional = yes, align = 256;
 }
--- a/mandel.s
+++ b/mandel.s
--- a/readme.md
+++ b/readme.md
@ -14,30 +14,37 @@ Non-goals:
 Enjoy! I'll probably work on this off and on for the next few weeks until I've got it producing fractals.
-- brion, january 2023
+-- brooke, january 2023 - december 2024
 ## Current state
-Basic rendering is functional, but no interactive behavior (zoom/pan) or benchmarking is done yet.
+Basic rendering is functional, with interactive zoom/pan (+/-/arrows) and 6 preset viewports via the number keys.
-The 16-bit signed integer multiplication works; it takes two 16-bit inputs and emits one 32-bit output in the zero page, using the Atari OS ROM's floating point registers as workspaces. Inputs are clobbered.
+The 16-bit signed integer multiplication takes two 16-bit inputs and emits one 32-bit output in the zero page, using the Atari OS ROM's floating point registers as workspaces. Inputs are clobbered.
-The main loop is a basic add-and-shift, using 16-bit adds which requires flipping the sign of negative inputs (otherwise you'd have to add all those sign-extension bits). Runs in 470-780 cycles depending on input.
+* 16-bit multiplies are decomposed into 4 8-bit unsigned multiplies and some addition
 * an optimized case for squares uses a table of 8-bit squares to reduce the number of 8-bit multiplication sub-ops
 * when expanded RAM is available as on 130XE, a 64KB 8-bit multiplication table accelerates the remaining multiplications
 * without expanded RAM, a table of half-squares is used to implement the algorithm from https://everything2.com/title/Fast+6502+multiplication
-The mandelbrot calculations are done using 4.12-precision fixed point numbers. It may be possible to squish this down to 3.13.
+The mandelbrot calculations are done using 3.13-precision fixed point numbers with 6.26-precision intermediates.
 Iterations are capped at 255.
-## Next steps
+The pixels are run in a progressive layout to get the basic shape on screen faster.
-Add a running counter of ms/px using the vertical blank interrupts as a timer. This'll show how further work improves it!
+There is a running counter of ms/px using the vertical blank interrupts as a timer, used to track our progress. :D
-Check for cycles in (zx,zy) output when in the 'lake'; if values repeat, they cannot escape. This is a big time saver in fractint.
+There's a check for cycles in (zx,zy) output when in the 'lake'; if values repeat, they cannot escape. This is a big time saver in fractint.
-I may be able to do a faster multiply using tables of squares for 8-bit component multiplication.
+There's some cute color cycling.
 ## Deps and build instructions
 I'm using `ca65` as a macro assembler, and have a Unix-style `Makefile` for building. Should work fairly easily on Linux and Mac. Might work on "raw" Windows but I use WSL for that.
 Currently produces a `.xex` executable, which can be booted up in common Atari emulators and some i/o devices.
 ## Todo
 See ideas in `todo.md`.
--- a/tables.js
+++ b/tables.js
@ -11,23 +11,40 @@ function db(func) {
    return lines.join('\n');
 }
 let squares = [];
 for (let i = 0; i < 512; i++) {
    squares.push(Math.trunc((i * i + 1) / 2));
 }
 console.log(
 `.segment "TABLES"
 .export mul_lobyte256
 .export mul_hibyte256
 .export mul_hibyte512
 .export sqr_lobyte
 .export sqr_hibyte
 ; (i * i + 1) / 2 for the multiplier
 .align 256
 mul_lobyte256:
-${db((x) => Math.round(x * x / 2) & 0xff)}
+${db((i) => squares[i] & 0xff)}
 .align 256
 mul_hibyte256:
-${db((x) => (Math.round(x * x / 2) >> 8) & 0xff)}
+${db((i) => (squares[i] >> 8) & 0xff)}
 .align 256
 mul_hibyte512:
-${db((x) => (Math.round((x + 256) * (x + 256) / 2) >> 8) & 0xff)}
+${db((i) => (squares[i + 256] >> 8) & 0xff)}
 ; (i * i) for the plain squares
 .align 256
 sqr_lobyte:
 ${db((i) => (i * i) & 0xff)}
 .align 256
 sqr_hibyte:
 ${db((i) => ((i * i) >> 8) & 0xff)}
 `);
--- a/todo.md
+++ b/todo.md
@ -0,0 +1,17 @@
 things to try:
 * fix status bar to show elapsed time, per-iter time, per-pixel iter count
 * 'turbo' mode disabling graphics in full or part
 * patch the entire expanded-ram imul8xe on top of imul8 to avoid the 3-cycle thunk penalty :D
 * maybe clean up the load/layout of the big mul table
 * consider alternate lookup tables in the top 16KB under ROM
 * y-axis mirror optimization
 * extract viewport for display & re-input via keyboard
 * fujinet screenshot/viewport uploader
Author	SHA1	Message	Date
Brooke Vibber	e0cc704d99	Fix drawing terminator, round usec	2025-01-08 18:34:46 -08:00
Brooke Vibber	7c04862d70	workaround for rounding us/iter for some reason rounding is giving me wrong results not sure what i'm doing wrong :D just show 6 digits :P ok this gets the us/iter working, and it is more stable but the elapsed time still needs to be added	2025-01-05 14:29:27 -08:00
Brooke Vibber	918d15e813	wip us/iter counter seems wrong, gives 32 all the time and that seems too small	2025-01-05 14:05:24 -08:00
Brooke Vibber	eaa00a055a	wip changing time units it does this weird thing where sometimes it's reading out wrong digits and then switches to expected unit of sec/px work in progress no clue what's going on	2025-01-04 18:46:51 -08:00
Brooke Vibber	7e5ca79d9a	move total_ms, total_pixels out of zero page this frees up 12 bytes of zero page space and costs no measurable time as these variables are not in the hot path and there was only a tiny bit different.	2025-01-04 14:25:25 -08:00
Brooke Vibber	d2bf77dc26	todo notes	2025-01-04 12:13:27 -08:00
Brooke Vibber	582ddf497f	apply jamey's suggestion of skipping add for high byte muls rather than saving 0 into the high bytes, then adding the high-byte multiplication later, write it directly in place. this saves a few cycles on every iteration, and it adds up nicely. View 1 overview render times: 130XE: 10.050 ms/px - 4m56s 800XL: 10.906 ms/px - 5m21s	2025-01-04 10:53:51 -08:00
Brooke Vibber	d157fe1306	Faster pixel skipping on 4x4, 2x2 tiers Iterate at fill_masks[fill_level]+1 instead of every pixel and then skipping, saves a smidge of time view 1 with expanded memory: 10.514 ms/px before 10.430 ms/px after	2025-01-04 10:06:12 -08:00
Brooke Vibber	dcf5a3f59e	sixth viewport	2025-01-01 21:15:38 -08:00
Brooke Vibber	837082cf56	tweak viewports skip experimental 6th viewport that got forgotten and limit max zoom to 7 (range 0-7) which is what looks good	2025-01-01 15:45:26 -08:00
Brooke Vibber	65fcb44934	3.13 / 6.26 gives nicer results!	2025-01-01 15:37:12 -08:00
Brooke Vibber	c424f1b8bc	fill in scanlines during tiering	2024-12-31 22:10:27 -08:00
Brooke Vibber	49fe315529	'wide pixels' should get better color on the composite video because the scanlines will be fuller of data	2024-12-31 20:13:11 -08:00
Brooke Vibber	f1ebb21bcb	wip not working wide pixels	2024-12-31 17:49:13 -08:00
Brooke Vibber	87caa52543	add viewport number 5 full zoom	2024-12-31 15:45:03 -08:00
Brooke Vibber	d8601bb856	fix fix	2024-12-31 15:03:43 -08:00
Brooke Vibber	7985ea9a39	fix panning for 32-bi	2024-12-31 14:45:38 -08:00
Brooke Vibber	cc83c76706	update docs for 32-bit intermediates	2024-12-31 14:16:43 -08:00
Brooke Vibber	2e8893fd78	haha fuck me	2024-12-31 13:54:53 -08:00
Brooke Vibber	81bf7f3c43	tweak	2024-12-31 09:53:22 -08:00
Brooke Vibber	1e0f577e09	wip	2024-12-31 09:09:11 -08:00
Brooke Vibber	d2f41f9644	wip	2024-12-31 09:02:42 -08:00
Brooke Vibber	2fcb30b76a	wip	2024-12-31 08:56:59 -08:00
Brooke Vibber	13257309dc	init fix	2024-12-31 08:34:02 -08:00
Brooke Vibber	7184b8e03f	wip	2024-12-31 08:24:47 -08:00
Brooke Vibber	4a1e35699a	wip	2024-12-31 08:24:44 -08:00
Brooke Vibber	0d086a179c	wip	2024-12-31 08:23:04 -08:00
Brooke Vibber	61eb1aaf21	notes	2024-12-31 05:11:26 -08:00
Brooke Vibber	b56dc1e98b	notes	2024-12-30 20:38:33 -08:00
Brooke Vibber	0a7293d8bc	do 4x4 2x2 1x1 only in prep for bigger pixels	2024-12-30 19:52:35 -08:00
Brooke Vibber	ec42f672d4	use an 8-item z buffer for slightly fasterness	2024-12-30 19:48:28 -08:00
Brooke Vibber	67649d4743	annotations, tweak	2024-12-30 19:17:02 -08:00
Brooke Vibber	ed79c80b16	update readme	2024-12-30 16:50:25 -08:00
Brooke Vibber	e6cbe0bc6b	notes	2024-12-30 16:43:18 -08:00
Brooke Vibber	6db8cef82d	51-70 cycles for xe :D	2024-12-30 15:17:50 -08:00
Brooke Vibber	9b7f6b8937	add a viewport in the front spike	2024-12-30 14:22:03 -08:00
Brooke Vibber	3bd9b1ac31	micro-optimizations in imul8xe 53-72 cycles overview in 10.896 ms/px	2024-12-30 14:09:02 -08:00
Brooke Vibber	63e74d5152	tweak	2024-12-30 13:44:31 -08:00
Brooke Vibber	14125a398a	cycle 'in' not 'out'	2024-12-30 11:35:45 -08:00
Brooke Vibber	71d8d93abc	even better palette cycling	2024-12-30 11:33:55 -08:00
Brooke Vibber	64a6cf50f3	awesome new palette cycler	2024-12-30 10:21:52 -08:00
Brooke Vibber	100c0f3314	1/2/3 selectable viewports	2024-12-30 09:19:41 -08:00
Brooke Vibber	e51aa91e4e	notes	2024-12-30 06:48:04 -08:00
Brooke Vibber	c4b98c7be2	optimize out a temporary down to 11.076 ms/px on xe	2024-12-30 05:35:22 -08:00
Brooke Vibber	70d2c91f03	fix bank switch on xl/xe was accidentally enabling basic rom :D 5m46s - 11.759 ms/px - 800xl 5m30s - 11.215 ms/px - 130xe	2024-12-30 03:56:35 -08:00
Brooke Vibber	acac5a8df4	moving the framebuffer into the basic space fails on 130xe and 800xl for some reason works on 800 as expected	2024-12-29 21:19:55 -08:00
Brooke Vibber	883f926e57	split memory, wip appears to work on 800 but xl/xe overlap basic lol	2024-12-29 21:06:48 -08:00
Brooke Vibber	0c63430dd9	wip tables segment to be	2024-12-29 20:37:58 -08:00
Brooke Vibber	3ab5006aa3	wip refacotring	2024-12-29 17:56:14 -08:00
Brooke Vibber	f903272335	refactoring and start on squares	2024-12-29 17:37:06 -08:00
Brooke Vibber	8ad996981a	whoops	2024-12-29 13:19:58 -08:00
Brooke Vibber	15fc5367f9	switck with the overview as default fo rnow	2024-12-29 13:18:54 -08:00
Brooke Vibber	2118890977	add an alternate viewport (compile-time currently) zoomed to max	2024-12-29 13:10:35 -08:00
Brooke Vibber	0fc5ba914f	fix pan/zoom bug was missing an rts on update_palette this happened to fall through to keycheck which if timing was wrong would dutifully process the viewport change and return to update_palette's caller which in turn was -not- expecting to reset the outer loop fixed	2024-12-29 12:29:36 -08:00
Brooke Vibber	2b0167226e	todos	2024-12-28 20:44:27 -08:00
Brooke Vibber	504457595a	correct zoom border checks	2024-12-28 18:11:35 -08:00
Brooke Vibber	0fcf4d6676	comment tweak	2024-12-28 17:40:21 -08:00
Brooke Vibber	d83b811444	remove stray copy of the expanded-ram imul it's not finished or working, just keep the core one :D	2024-12-28 15:13:06 -08:00
Brooke Vibber	f32cc5fa7c	whoops	2024-12-27 19:15:19 -08:00
Brooke Vibber	052a19b6aa	Merge pull request 'xe' (#1 ) from xe into main Reviewed-on: https://brooke.vibber.net/git/git/brooke/mandel-6502/pulls/1	2024-12-28 02:40:01 +00:00
Brooke Vibber	83cba4afa3	Runtime detection of XE-style extended memory Uses the "big multiplication table" in 64KB of extended memory if bank switching appears to work, otherwise uses the table of squares lookups. Initial view clocks in at 13.133 ms/px for the XE version and still 14.211 ms/px for the 400/800/XL version. Tested in emulator with 130XE and XL+Ultimate 1MB upgrade configs, and base implementation on the 800XL emulator.	2024-12-27 18:37:03 -08:00
Brooke Vibber	ee1c268705	it works	2024-12-26 21:49:13 -08:00
Brooke Vibber	e84a990789	tweaks:	2024-12-26 21:41:03 -08:00
Brooke Vibber	0cde31905e	runs but doesn't work	2024-12-26 18:35:37 -08:00
Brooke Vibber	45c5a4cb2d	called, gets lost	2024-12-26 18:20:10 -08:00
Brooke Vibber	34ce9da030	builds, not used yte	2024-12-26 18:17:01 -08:00
Brooke Vibber	a9d551a98d	first draft initializer	2024-12-26 17:50:59 -08:00
Brooke Vibber	829d2860e8	:P	2024-12-26 12:04:01 -08:00
Brooke Vibber	f996c3cbcd	provisional maybe old mode runs in 81-92 cycles provisional code runs in 58-77 cycles if it works ;)	2024-12-25 12:47:37 -08:00
Brooke Vibber	405cec6d51	WIP imul8 via table experiments planning to try a 64KB table of 8x7-bit multiplies in the high memory on a 130XE or other high-memory-capable machine not yet working or finished too many cycles of overhead per invocation	2024-12-25 10:51:27 -08:00
Brooke Vibber	05133aabdd	slightly faster handling of signed mul previously we were flipping the inputs if negative, and then the output if both inputs were negative turns out you can just treat the whole thing as an unsigned mul and then subtract each term from the high word if the other term is negative. https://stackoverflow.com/a/28827013 this saves a handful of cycles, reducing our runtime to a merge 14.211 ms/px \o/	2024-12-15 20:17:45 -08:00
Brooke Vibber	7f2bc43cff	squares	2024-12-14 18:56:26 -08:00
Brion Vibber	5637783529	Faster imul16 routine Improves runtime from 16.24 ms/px to 14.44 ms/px This uses a routine found on Everything2: https://everything2.com/title/Fast+6502+multiplication which uses a lookup table of squares to do 8-bit imuls, which are then composed into a 16-bit imul	2024-12-14 18:53:31 -08:00
Brooke Vibber	29630c8887	update palette more smoothly	2024-08-19 13:21:44 -07:00
Brooke Vibber	c559b6e76b	palette adjustment	2024-08-18 21:07:53 -07:00
Brooke Vibber	6f05a9bbd0	basic palette cycling	2024-08-18 21:06:30 -07:00
Brooke Vibber	8be03993ab	fix time of drawing of 'DONE' text	2024-08-18 20:29:39 -07:00
Brooke Vibber	ee5b12dae8	mailmap	2024-08-18 20:15:47 -07:00
Brooke Vibber	201d9bf15c	clear screen after zoom/scroll	2024-02-25 15:15:23 -08:00
Brooke Vibber	c152c4346b	Progressive pixel layout	2024-02-04 14:25:15 -08:00
Brion Vibber	510457f97a	add a note to fix stats when changing zoom	2023-03-11 21:15:08 -08:00
Brion Vibber	3d792603db	keyboard nav sorta working	2023-03-11 20:45:32 -08:00
Brion Vibber	b1c26c1edd	WIP fix keyboard check	2023-03-05 16:57:41 -08:00
Brion Vibber	53336f7af1	WIP quick hack to check keyboard this for some reason only works ONCE though I can replicate the logic in BASIC and it works over multiple keys not sure what's wrong	2023-03-05 15:45:44 -08:00
Brion Vibber	24abc21b01	move speed to the right	2023-03-05 13:56:50 -08:00
Brion Vibber	9926ec28e7	clean up speed display now uses ms/px msg	2023-03-05 13:48:39 -08:00
Brion Vibber	0501a364c7	Check for repeated zx/zy values These will never escape, so saves some time in the lake trick is taken from fractint	2023-02-12 11:56:20 -08:00
		`@ -0,0 +1,2 @@`
							`Brooke Vibber <bvibber@pobox.com>`
							`Brooke Vibber <bvibber@pobox.com> <brion@pobox.com>`