zeropage tweaks

* switched zero-page from hardcoded assignments to symbols * moved most non-hotpath stuff out to .data * merged ptr and pixel_ptr Slight slowdown in Atari800MacX from 5m13s to 5m15s
Rearrange the segments a bit
2026-04-08 20:16:31 -07:00 · 2025-12-28 12:55:08 -08:00 · 2025-12-28 12:32:57 -08:00 · 2025-12-28 09:23:38 -08:00 · 2025-09-16 21:56:50 -07:00 · 2025-09-06 19:53:25 -07:00
7 changed files with 824 additions and 298 deletions
--- a/10
+++ b/10
@ -2,8 +2,11 @@
 all : mandel.xex
-mandel.xex : mandel.o tables.o atari-asm-xex.cfg
+mandel.xex : mandel.o mandel-core.o tables.o atari-xex.cfg
-	ld65 -C ./atari-asm-xex.cfg -o $@ mandel.o tables.o
+	ld65 -C ./atari-xex.cfg --mapfile mandel.map -o $@ mandel.o mandel-core.o tables.o atari.lib
 mandel.s : mandel.c mandel.h
 	cc65 -o $@ mandel.c
 %.o : %.s
 	ca65 -o $@ $<
@ -13,6 +16,7 @@ tables.s : tables.js
 clean :
 	rm -f tables.s
 	rm -f mandel.s
 	rm -f *.o
 	rm -f *.xex
-
+	rm -f mandel.map
--- a/atari-xex.cfg
+++ b/atari-xex.cfg
@ -0,0 +1,69 @@
 # Sample linker configuration for C programs using the Atari binary file support.
 # Use with: cl65 -tatari -Catari-xex.cfg prog.c -o prog.xex
 FEATURES {
    STARTADDRESS: default = $8000;
 }
 SYMBOLS {
    __SYSTEM_CHECK__:    type = import;  # force inclusion of "system check" load chunk
    __STACKSIZE__:       type = weak, value = $0800; # 2k stack
    __STARTADDRESS__:    type = export, value = %S;
    __RESERVED_MEMORY__: type = weak, value = $0000;
    __SYSCHKHDR__:       type = export, value = 0; # Disable system check header
    __SYSCHKTRL__:       type = export, value = 0; # Disable system check trailer
    __TABLESEG_START__:    type = weak, value = $2E00 + $0300;
    __TABLESEG_SIZE__:     type = weak, value = 6 * $100;
    __BANKSY_START__:  type = weak, value = $4000;
    __BANKSY_SIZE__:   type = weak, value = $4000;
    __FRAMEBUFFER_START__: type = weak, value = $A000;
 }
 MEMORY {
 # Note -- $80 and $81 (LOMEM) appear to be reserved in ZP.
    ZP:         file = "", define = yes, start = $0082, size = $007E;
 # "system check" load chunk
    SYSCHKCHNK: file = %O,               start = $2E00, size = $0300;
 # Note $a000-$bfff is against the BASIC cartridge, may require booting with OPTION.
    TABLES:     file = %O, define = yes, start = __TABLESEG_START__, size = __TABLESEG_SIZE__;
 # We reserve $4000-7fff for the bank-switch window.
 # In theory we could keep data and code here that we only use on 48k/64k systems.
    BANKSWITCH: file = "", define = yes, start = __BANKSY_START__, size = __BANKSY_SIZE__;
 # "main program" load chunk
    MAIN:       file = %O, define = yes, start = %S, size = __FRAMEBUFFER_START__ - __STACKSIZE__ - __RESERVED_MEMORY__ - %S;
 }
 FILES {
    %O: format = atari;
 }
 FORMATS {
    atari: runad = start,
           initad = SYSCHKCHNK: __SYSTEM_CHECK__;
 }
 SEGMENTS {
    ZEROPAGE:  load = ZP,         type = zp;
    EXTZP:     load = ZP,         type = zp,                optional = yes;
    SYSCHK:    load = SYSCHKCHNK, type = rw,  define = yes, optional = yes;
    TABLES:    load = TABLES,     type = ro,  optional = yes, align = 256;
    BANKSWICH: load = BANKSWITCH, type = ro,  optional = yes;
    STARTUP:   load = MAIN,       type = ro,  define = yes;
    LOWBSS:    load = MAIN,       type = rw,                optional = yes;  # not zero initialized
    LOWCODE:   load = MAIN,       type = ro,  define = yes, optional = yes;
    ONCE:      load = MAIN,       type = ro,                optional = yes;
    CODE:      load = MAIN,       type = ro,  define = yes;
    RODATA:    load = MAIN,       type = ro;
    DATA:      load = MAIN,       type = rw;
    INIT:      load = MAIN,       type = rw,                optional = yes;
    BSS:       load = MAIN,       type = bss, define = yes;
 }
 FEATURES {
    CONDES: type    = constructor,
            label   = __CONSTRUCTOR_TABLE__,
            count   = __CONSTRUCTOR_COUNT__,
            segment = ONCE;
    CONDES: type    = destructor,
            label   = __DESTRUCTOR_TABLE__,
            count   = __DESTRUCTOR_COUNT__,
            segment = RODATA;
    CONDES: type    = interruptor,
            label   = __INTERRUPTOR_TABLE__,
            count   = __INTERRUPTOR_COUNT__,
            segment = RODATA,
            import  = __CALLIRQ__;
 }
--- a/mandel-core.s
+++ b/mandel-core.s
--- a/mandel.c
+++ b/mandel.c
@ -0,0 +1,15 @@
 /**
 * The UI and I/O wrapper for the Mandelbrot runner, in C.
 *
 * For the moment *all* logic is in mandel-core.s, I'm just
 * trying to get this to run within a cc65 environment.
 * Eventually just the inner loop fun will live in there.
 */
 #include <stdlib.h>
 #include <stdio.h>
 #include "mandel.h"
 void main(void) {
    mandel_start();
 }
--- a/mandel.h
+++ b/mandel.h
@ -0,0 +1,4 @@
 #include <inttypes.h>
 // From mandel-core.s:
 extern void mandel_start(void);
--- a/readme.md
+++ b/readme.md
@ -18,7 +18,7 @@ Enjoy! I'll probably work on this off and on for the next few weeks until I've g
 ## Current state
-Basic rendering is functional, with interactive zoom/pan (+/-/arrows) and 4 preset viewports via the number keys.
+Basic rendering is functional, with interactive zoom/pan (+/-/arrows) and 6 preset viewports via the number keys.
 The 16-bit signed integer multiplication takes two 16-bit inputs and emits one 32-bit output in the zero page, using the Atari OS ROM's floating point registers as workspaces. Inputs are clobbered.
@ -27,7 +27,7 @@ The 16-bit signed integer multiplication takes two 16-bit inputs and emits one 3
 * when expanded RAM is available as on 130XE, a 64KB 8-bit multiplication table accelerates the remaining multiplications
 * without expanded RAM, a table of half-squares is used to implement the algorithm from https://everything2.com/title/Fast+6502+multiplication
-The mandelbrot calculations are done using 4.12-precision fixed point numbers with 8.24-precision intermediates. It may be possible to squish this down to 3.13/6.26.
+The mandelbrot calculations are done using 3.13-precision fixed point numbers with 6.26-precision intermediates.
 Iterations are capped at 255.
--- a/todo.md
+++ b/todo.md
@ -1,19 +1,17 @@
 things to try:
-* skip add on the top-byte multiply in sqr8/mul8
+* fix status bar to show elapsed time, per-iter time, per-pixel iter count
-  * should save a few cycles, suggestion by jamey
+
 * 'turbo' mode disabling graphics in full or part
 * patch the entire expanded-ram imul8xe on top of imul8 to avoid the 3-cycle thunk penalty :D
-* try 3.13 fixed point instead of 4.12 for more precision
+* maybe clean up the load/layout of the big mul table
-  * can we get away without the extra bit?
+
-  * since exit compare space would be 6.26 i think so
+* consider alternate lookup tables in the top 16KB under ROM
 * y-axis mirror optimization
 * 'wide pixels' 2x and 4x for a fuller initial image in the tiered rendering
  * maybe redo tiering to just 4x4, 2x2, 1x1?
 * extract viewport for display & re-input via keyboard
 * fujinet screenshot/viewport uploader
Author	SHA1	Message	Date
Brooke Vibber	25c37a1188	zeropage tweaks * switched zero-page from hardcoded assignments to symbols * moved most non-hotpath stuff out to .data * merged ptr and pixel_ptr Slight slowdown in Atari800MacX from 5m13s to 5m15s	2026-04-08 20:16:31 -07:00
Brooke Vibber	a93dd00e36	Rearrange the segments a bit * put TABLES in the low memory, before the bank switch window * reserve bank switch window * put rest of the code after that and before the framebuffer so TABLES lives just before $4000 and MAIN lives in $8000-$bfff could split some more code and/or data into low mem and/or move the tables not used in extended memory mode into the bank switch window so they take no address space on XE or expanded memory machines	2025-12-28 12:55:08 -08:00
Brooke Vibber	97fdc12565	Put the tables before the main code, and shrink the segment Leaves more room for code and dynamic data/stack	2025-12-28 12:32:57 -08:00
Brooke Vibber	b27be3c159	Add a C shell, which currently just passes through This is a first step toward moving the UI to C and adding file and network I/O in C. The fractal core will remain in assembler as well as the multiplier.	2025-12-28 09:23:38 -08:00
Brooke Vibber	6479cf530c	update some timings	2025-09-16 21:56:50 -07:00
Brooke Vibber	29cd3d968f	Shaves 3 seconds off initial view runtime on XE :D Instead of relying solely on the JMP thunks added to imul16_func and sqr16_func, three call sites within the mandelbrot iteration function are patched directly to jsr to the XE versions, saving like 15 cycles per iter Ok so it's not a lot, but every seconds counts. ;) with XE code disabled: 1539 us/iter 5m13s with old XE code: 1417 us/iter 4m48s with new XE code: 1406 us/iter 4m45s	2025-09-06 19:53:25 -07:00
Brooke Vibber	b46e6fb343	fix typo on stub x/y inputs was accidentally falling through to the load a viewport from a keypress thingy which was not needed here	2025-09-01 12:28:33 -07:00
Brooke Vibber	f2a6af0995	Replace the not-enough-precision 32 bit to float impl keep the proc though to encapsulate it but uses the older logic of rounding down to 3.13 first	2025-07-03 18:43:10 -07:00
Brooke Vibber	96e0356e57	WIP input handling for coords experimental output via 32-bits mult, looses precision in conversion	2025-07-03 18:41:24 -07:00
Brooke Vibber	fab2760394	refactor countdown as a procedure call	2025-06-28 13:43:43 -07:00
Brooke Vibber	fd954da47e	Create map file for convenience export a symbol and it'll appear in mandel.map	2025-06-23 08:17:39 -07:00
Brooke Vibber	4bac47a4fd	fix at 256 seconds	2025-06-23 00:31:53 -07:00
Brooke Vibber	5cf64970c8	Ah that's better used the appropriate instruction for comparison	2025-06-22 23:10:43 -07:00
Brooke	f7082ab371	wip subtraction method, still not working	2025-06-22 22:21:26 -07:00
Brooke	689363d083	WIP code for elapsed time not finished, doesn't work right	2025-06-22 20:00:35 -07:00
Brooke Vibber	89b4e45901	flip the y coordinate sign	2025-02-22 20:24:04 -08:00
Brooke Vibber	6e66145ec6	whoops fixes	2025-02-22 15:37:11 -08:00
Brooke Vibber	07db3d00d7	second status bar display with coords/zoom currently using 3.13 precision to output to floats for formatting	2025-02-22 11:23:13 -08:00
Brooke Vibber	26d612b6f3	move 8 scan lines on the bottom to status bar	2025-02-21 19:42:10 -08:00
Brooke Vibber	25da81c64b	clean up text draw, fix offset by one	2025-02-02 16:40:58 -08:00
Brooke Vibber	d182d33b35	draw_string	2025-02-01 10:02:01 -08:00
Brooke Vibber	e0cc704d99	Fix drawing terminator, round usec	2025-01-08 18:34:46 -08:00
Brooke Vibber	7c04862d70	workaround for rounding us/iter for some reason rounding is giving me wrong results not sure what i'm doing wrong :D just show 6 digits :P ok this gets the us/iter working, and it is more stable but the elapsed time still needs to be added	2025-01-05 14:29:27 -08:00
Brooke Vibber	918d15e813	wip us/iter counter seems wrong, gives 32 all the time and that seems too small	2025-01-05 14:05:24 -08:00
Brooke Vibber	eaa00a055a	wip changing time units it does this weird thing where sometimes it's reading out wrong digits and then switches to expected unit of sec/px work in progress no clue what's going on	2025-01-04 18:46:51 -08:00
Brooke Vibber	7e5ca79d9a	move total_ms, total_pixels out of zero page this frees up 12 bytes of zero page space and costs no measurable time as these variables are not in the hot path and there was only a tiny bit different.	2025-01-04 14:25:25 -08:00
Brooke Vibber	d2bf77dc26	todo notes	2025-01-04 12:13:27 -08:00
Brooke Vibber	582ddf497f	apply jamey's suggestion of skipping add for high byte muls rather than saving 0 into the high bytes, then adding the high-byte multiplication later, write it directly in place. this saves a few cycles on every iteration, and it adds up nicely. View 1 overview render times: 130XE: 10.050 ms/px - 4m56s 800XL: 10.906 ms/px - 5m21s	2025-01-04 10:53:51 -08:00
Brooke Vibber	d157fe1306	Faster pixel skipping on 4x4, 2x2 tiers Iterate at fill_masks[fill_level]+1 instead of every pixel and then skipping, saves a smidge of time view 1 with expanded memory: 10.514 ms/px before 10.430 ms/px after	2025-01-04 10:06:12 -08:00
Brooke Vibber	dcf5a3f59e	sixth viewport	2025-01-01 21:15:38 -08:00
Brooke Vibber	837082cf56	tweak viewports skip experimental 6th viewport that got forgotten and limit max zoom to 7 (range 0-7) which is what looks good	2025-01-01 15:45:26 -08:00
Brooke Vibber	65fcb44934	3.13 / 6.26 gives nicer results!	2025-01-01 15:37:12 -08:00
Brooke Vibber	c424f1b8bc	fill in scanlines during tiering	2024-12-31 22:10:27 -08:00
Brooke Vibber	49fe315529	'wide pixels' should get better color on the composite video because the scanlines will be fuller of data	2024-12-31 20:13:11 -08:00
Brooke Vibber	f1ebb21bcb	wip not working wide pixels	2024-12-31 17:49:13 -08:00
Brooke Vibber	87caa52543	add viewport number 5 full zoom	2024-12-31 15:45:03 -08:00