Compare commits

..

87 commits

Author SHA1 Message Date
e0cc704d99 Fix drawing terminator, round usec 2025-01-08 18:34:46 -08:00
7c04862d70 workaround for rounding us/iter
for some reason rounding is giving me wrong results
not sure what i'm doing wrong :D

just show 6 digits :P

ok this gets the us/iter working, and it is more stable
but the elapsed time still needs to be added
2025-01-05 14:29:27 -08:00
918d15e813 wip us/iter counter
seems wrong, gives 32 all the time and that seems too small
2025-01-05 14:05:24 -08:00
eaa00a055a wip changing time units
it does this weird thing where sometimes it's reading out wrong digits
and then switches to expected unit of sec/px

work in progress no clue what's going on
2025-01-04 18:46:51 -08:00
7e5ca79d9a move total_ms, total_pixels out of zero page
this frees up 12 bytes of zero page space and costs no measurable
time as these variables are not in the hot path and there was only
a tiny bit different.
2025-01-04 14:25:25 -08:00
d2bf77dc26 todo notes 2025-01-04 12:13:27 -08:00
582ddf497f apply jamey's suggestion of skipping add for high byte muls
rather than saving 0 into the high bytes, then adding the high-byte
multiplication later, write it directly in place. this saves a few
cycles on every iteration, and it adds up nicely.

View 1 overview render times:
130XE: 10.050 ms/px - 4m56s
800XL: 10.906 ms/px - 5m21s
2025-01-04 10:53:51 -08:00
d157fe1306 Faster pixel skipping on 4x4, 2x2 tiers
Iterate at fill_masks[fill_level]+1 instead of every pixel and then
skipping, saves a smidge of time

view 1 with expanded memory:
10.514 ms/px before
10.430 ms/px after
2025-01-04 10:06:12 -08:00
dcf5a3f59e sixth viewport 2025-01-01 21:15:38 -08:00
837082cf56 tweak viewports
skip experimental 6th viewport that got forgotten
and limit max zoom to 7 (range 0-7) which is what looks good
2025-01-01 15:45:26 -08:00
65fcb44934 3.13 / 6.26 gives nicer results! 2025-01-01 15:37:12 -08:00
c424f1b8bc fill in scanlines during tiering 2024-12-31 22:10:27 -08:00
49fe315529 'wide pixels'
should get better color on the composite video because the
scanlines will be fuller of data
2024-12-31 20:13:11 -08:00
f1ebb21bcb wip not working wide pixels 2024-12-31 17:49:13 -08:00
87caa52543 add viewport number 5 full zoom 2024-12-31 15:45:03 -08:00
d8601bb856 fix fix 2024-12-31 15:03:43 -08:00
7985ea9a39 fix panning for 32-bi 2024-12-31 14:45:38 -08:00
cc83c76706 update docs for 32-bit intermediates 2024-12-31 14:16:43 -08:00
2e8893fd78 haha fuck me 2024-12-31 13:54:53 -08:00
81bf7f3c43 tweak 2024-12-31 09:53:22 -08:00
1e0f577e09 wip 2024-12-31 09:09:11 -08:00
d2f41f9644 wip 2024-12-31 09:02:42 -08:00
2fcb30b76a wip 2024-12-31 08:56:59 -08:00
13257309dc init fix 2024-12-31 08:34:02 -08:00
7184b8e03f wip 2024-12-31 08:24:47 -08:00
4a1e35699a wip 2024-12-31 08:24:44 -08:00
0d086a179c wip 2024-12-31 08:23:04 -08:00
61eb1aaf21 notes 2024-12-31 05:11:26 -08:00
b56dc1e98b notes 2024-12-30 20:38:33 -08:00
0a7293d8bc do 4x4 2x2 1x1 only
in prep for bigger pixels
2024-12-30 19:52:35 -08:00
ec42f672d4 use an 8-item z buffer for slightly fasterness 2024-12-30 19:48:28 -08:00
67649d4743 annotations, tweak 2024-12-30 19:17:02 -08:00
ed79c80b16 update readme 2024-12-30 16:50:25 -08:00
e6cbe0bc6b notes 2024-12-30 16:43:18 -08:00
6db8cef82d 51-70 cycles for xe :D 2024-12-30 15:17:50 -08:00
9b7f6b8937 add a viewport in the front spike 2024-12-30 14:22:03 -08:00
3bd9b1ac31 micro-optimizations in imul8xe
53-72 cycles
overview in 10.896 ms/px
2024-12-30 14:09:02 -08:00
63e74d5152 tweak 2024-12-30 13:44:31 -08:00
14125a398a cycle 'in' not 'out' 2024-12-30 11:35:45 -08:00
71d8d93abc even better palette cycling 2024-12-30 11:33:55 -08:00
64a6cf50f3 awesome new palette cycler 2024-12-30 10:21:52 -08:00
100c0f3314 1/2/3 selectable viewports 2024-12-30 09:19:41 -08:00
e51aa91e4e notes 2024-12-30 06:48:04 -08:00
c4b98c7be2 optimize out a temporary
down to 11.076 ms/px on xe
2024-12-30 05:35:22 -08:00
70d2c91f03 fix bank switch on xl/xe
was accidentally enabling basic rom :D

5m46s - 11.759 ms/px - 800xl
5m30s - 11.215 ms/px - 130xe
2024-12-30 03:56:35 -08:00
acac5a8df4 moving the framebuffer into the basic space
fails on 130xe and 800xl for some reason

works on 800 as expected
2024-12-29 21:19:55 -08:00
883f926e57 split memory, wip
appears to work on 800 but xl/xe overlap basic lol
2024-12-29 21:06:48 -08:00
0c63430dd9 wip tables segment to be 2024-12-29 20:37:58 -08:00
3ab5006aa3 wip refacotring 2024-12-29 17:56:14 -08:00
f903272335 refactoring and start on squares 2024-12-29 17:37:06 -08:00
8ad996981a whoops 2024-12-29 13:19:58 -08:00
15fc5367f9 switck with the overview as default fo rnow 2024-12-29 13:18:54 -08:00
2118890977 add an alternate viewport (compile-time currently)
zoomed to max
2024-12-29 13:10:35 -08:00
0fc5ba914f fix pan/zoom bug
was missing an rts on update_palette

this happened to fall through to keycheck
which if timing was wrong would dutifully process the viewport
change and return to update_palette's caller

which in turn was -not- expecting to reset the outer loop

fixed
2024-12-29 12:29:36 -08:00
2b0167226e todos 2024-12-28 20:44:27 -08:00
504457595a correct zoom border checks 2024-12-28 18:11:35 -08:00
0fcf4d6676 comment tweak 2024-12-28 17:40:21 -08:00
d83b811444 remove stray copy of the expanded-ram imul
it's not finished or working, just keep the core one :D
2024-12-28 15:13:06 -08:00
f32cc5fa7c whoops 2024-12-27 19:15:19 -08:00
052a19b6aa Merge pull request 'xe' (#1) from xe into main
Reviewed-on: https://brooke.vibber.net/git/git/brooke/mandel-6502/pulls/1
2024-12-28 02:40:01 +00:00
83cba4afa3 Runtime detection of XE-style extended memory
Uses the "big multiplication table" in 64KB of extended memory if
bank switching appears to work, otherwise uses the table of squares
lookups.

Initial view clocks in at 13.133 ms/px for the XE version and still
14.211 ms/px for the 400/800/XL version.

Tested in emulator with 130XE and XL+Ultimate 1MB upgrade configs,
and base implementation on the 800XL emulator.
2024-12-27 18:37:03 -08:00
ee1c268705 it works 2024-12-26 21:49:13 -08:00
e84a990789 tweaks: 2024-12-26 21:41:03 -08:00
0cde31905e runs but doesn't work 2024-12-26 18:35:37 -08:00
45c5a4cb2d called, gets lost 2024-12-26 18:20:10 -08:00
34ce9da030 builds, not used yte 2024-12-26 18:17:01 -08:00
a9d551a98d first draft initializer 2024-12-26 17:50:59 -08:00
829d2860e8 :P 2024-12-26 12:04:01 -08:00
f996c3cbcd provisional maybe
old mode runs in 81-92 cycles

provisional code runs in 58-77 cycles

if it works ;)
2024-12-25 12:47:37 -08:00
405cec6d51 WIP imul8 via table experiments
planning to try a 64KB table of 8x7-bit multiplies in the high memory
on a 130XE or other high-memory-capable machine

not yet working or finished

too many cycles of overhead per invocation
2024-12-25 10:51:27 -08:00
05133aabdd slightly faster handling of signed mul
previously we were flipping the inputs if negative, and then the
output if both inputs were negative

turns out you can just treat the whole thing as an unsigned mul
and then subtract each term from the high word if the other term
is negative.

https://stackoverflow.com/a/28827013

this saves a handful of cycles, reducing our runtime to a merge
14.211 ms/px \o/
2024-12-15 20:17:45 -08:00
7f2bc43cff squares 2024-12-14 18:56:26 -08:00
5637783529 Faster imul16 routine
Improves runtime from 16.24 ms/px to 14.44 ms/px

This uses a routine found on Everything2:
https://everything2.com/title/Fast+6502+multiplication

which uses a lookup table of squares to do 8-bit imuls,
which are then composed into a 16-bit imul
2024-12-14 18:53:31 -08:00
29630c8887 update palette more smoothly 2024-08-19 13:21:44 -07:00
c559b6e76b palette adjustment 2024-08-18 21:07:53 -07:00
6f05a9bbd0 basic palette cycling 2024-08-18 21:06:30 -07:00
8be03993ab fix time of drawing of 'DONE' text 2024-08-18 20:29:39 -07:00
ee5b12dae8 mailmap 2024-08-18 20:15:47 -07:00
201d9bf15c clear screen after zoom/scroll 2024-02-25 15:15:23 -08:00
c152c4346b Progressive pixel layout 2024-02-04 14:25:15 -08:00
510457f97a add a note to fix stats when changing zoom 2023-03-11 21:15:08 -08:00
3d792603db keyboard nav sorta working 2023-03-11 20:45:32 -08:00
b1c26c1edd WIP fix keyboard check 2023-03-05 16:57:41 -08:00
53336f7af1 WIP quick hack to check keyboard
this for some reason only works ONCE
though I can replicate the logic in BASIC
and it works over multiple keys
not sure what's wrong
2023-03-05 15:45:44 -08:00
24abc21b01 move speed to the right 2023-03-05 13:56:50 -08:00
9926ec28e7 clean up speed display now uses ms/px msg 2023-03-05 13:48:39 -08:00
0501a364c7 Check for repeated zx/zy values
These will never escape, so saves
some time in the lake

trick is taken from fractint
2023-02-12 11:56:20 -08:00
7 changed files with 1257 additions and 380 deletions

2
.mailmap Normal file
View file

@ -0,0 +1,2 @@
Brooke Vibber <bvibber@pobox.com>
Brooke Vibber <bvibber@pobox.com> <brion@pobox.com>

View file

@ -2,8 +2,8 @@
all : mandel.xex all : mandel.xex
mandel.xex : mandel.o tables.o mandel.xex : mandel.o tables.o atari-asm-xex.cfg
ld65 -C ./atari-asm-xex.cfg -o $@ $+ ld65 -C ./atari-asm-xex.cfg -o $@ mandel.o tables.o
%.o : %.s %.o : %.s
ca65 -o $@ $< ca65 -o $@ $<

28
atari-asm-xex.cfg Normal file
View file

@ -0,0 +1,28 @@
FEATURES {
STARTADDRESS: default = $2E00;
}
SYMBOLS {
__STARTADDRESS__: type = export, value = %S;
}
MEMORY {
ZP: file = "", define = yes, start = $0082, size = $007E;
MAIN: file = %O, define = yes, start = %S, size = $4000 - %S;
# Keep $4000-7fff clear for expanded RAM access window
TABLES: file = %O, define = yes, start = $8000, size = $a000 - $8000;
# Keep $a000-$bfff clear for BASIC cartridge
}
FILES {
%O: format = atari;
}
FORMATS {
atari: runad = start;
}
SEGMENTS {
ZEROPAGE: load = ZP, type = zp, optional = yes;
EXTZP: load = ZP, type = zp, optional = yes; # to enable modules to be able to link to C and assembler programs
CODE: load = MAIN, type = rw, define = yes;
RODATA: load = MAIN, type = ro optional = yes;
DATA: load = MAIN, type = rw optional = yes;
BSS: load = MAIN, type = bss, optional = yes, define = yes;
TABLES: load = TABLES, type = ro, optional = yes, align = 256;
}

1510
mandel.s

File diff suppressed because it is too large Load diff

View file

@ -14,30 +14,37 @@ Non-goals:
Enjoy! I'll probably work on this off and on for the next few weeks until I've got it producing fractals. Enjoy! I'll probably work on this off and on for the next few weeks until I've got it producing fractals.
-- brion, january 2023 -- brooke, january 2023 - december 2024
## Current state ## Current state
Basic rendering is functional, but no interactive behavior (zoom/pan) or benchmarking is done yet. Basic rendering is functional, with interactive zoom/pan (+/-/arrows) and 6 preset viewports via the number keys.
The 16-bit signed integer multiplication works; it takes two 16-bit inputs and emits one 32-bit output in the zero page, using the Atari OS ROM's floating point registers as workspaces. Inputs are clobbered. The 16-bit signed integer multiplication takes two 16-bit inputs and emits one 32-bit output in the zero page, using the Atari OS ROM's floating point registers as workspaces. Inputs are clobbered.
The main loop is a basic add-and-shift, using 16-bit adds which requires flipping the sign of negative inputs (otherwise you'd have to add all those sign-extension bits). Runs in 470-780 cycles depending on input. * 16-bit multiplies are decomposed into 4 8-bit unsigned multiplies and some addition
* an optimized case for squares uses a table of 8-bit squares to reduce the number of 8-bit multiplication sub-ops
* when expanded RAM is available as on 130XE, a 64KB 8-bit multiplication table accelerates the remaining multiplications
* without expanded RAM, a table of half-squares is used to implement the algorithm from https://everything2.com/title/Fast+6502+multiplication
The mandelbrot calculations are done using 4.12-precision fixed point numbers. It may be possible to squish this down to 3.13. The mandelbrot calculations are done using 3.13-precision fixed point numbers with 6.26-precision intermediates.
Iterations are capped at 255. Iterations are capped at 255.
## Next steps The pixels are run in a progressive layout to get the basic shape on screen faster.
Add a running counter of ms/px using the vertical blank interrupts as a timer. This'll show how further work improves it! There is a running counter of ms/px using the vertical blank interrupts as a timer, used to track our progress. :D
Check for cycles in (zx,zy) output when in the 'lake'; if values repeat, they cannot escape. This is a big time saver in fractint. There's a check for cycles in (zx,zy) output when in the 'lake'; if values repeat, they cannot escape. This is a big time saver in fractint.
I may be able to do a faster multiply using tables of squares for 8-bit component multiplication. There's some cute color cycling.
## Deps and build instructions ## Deps and build instructions
I'm using `ca65` as a macro assembler, and have a Unix-style `Makefile` for building. Should work fairly easily on Linux and Mac. Might work on "raw" Windows but I use WSL for that. I'm using `ca65` as a macro assembler, and have a Unix-style `Makefile` for building. Should work fairly easily on Linux and Mac. Might work on "raw" Windows but I use WSL for that.
Currently produces a `.xex` executable, which can be booted up in common Atari emulators and some i/o devices. Currently produces a `.xex` executable, which can be booted up in common Atari emulators and some i/o devices.
## Todo
See ideas in `todo.md`.

View file

@ -11,23 +11,40 @@ function db(func) {
return lines.join('\n'); return lines.join('\n');
} }
let squares = [];
for (let i = 0; i < 512; i++) {
squares.push(Math.trunc((i * i + 1) / 2));
}
console.log( console.log(
`.segment "TABLES" `.segment "TABLES"
.export mul_lobyte256 .export mul_lobyte256
.export mul_hibyte256 .export mul_hibyte256
.export mul_hibyte512 .export mul_hibyte512
.export sqr_lobyte
.export sqr_hibyte
; (i * i + 1) / 2 for the multiplier
.align 256 .align 256
mul_lobyte256: mul_lobyte256:
${db((x) => Math.round(x * x / 2) & 0xff)} ${db((i) => squares[i] & 0xff)}
.align 256 .align 256
mul_hibyte256: mul_hibyte256:
${db((x) => (Math.round(x * x / 2) >> 8) & 0xff)} ${db((i) => (squares[i] >> 8) & 0xff)}
.align 256 .align 256
mul_hibyte512: mul_hibyte512:
${db((x) => (Math.round((x + 256) * (x + 256) / 2) >> 8) & 0xff)} ${db((i) => (squares[i + 256] >> 8) & 0xff)}
; (i * i) for the plain squares
.align 256
sqr_lobyte:
${db((i) => (i * i) & 0xff)}
.align 256
sqr_hibyte:
${db((i) => ((i * i) >> 8) & 0xff)}
`); `);

17
todo.md Normal file
View file

@ -0,0 +1,17 @@
things to try:
* fix status bar to show elapsed time, per-iter time, per-pixel iter count
* 'turbo' mode disabling graphics in full or part
* patch the entire expanded-ram imul8xe on top of imul8 to avoid the 3-cycle thunk penalty :D
* maybe clean up the load/layout of the big mul table
* consider alternate lookup tables in the top 16KB under ROM
* y-axis mirror optimization
* extract viewport for display & re-input via keyboard
* fujinet screenshot/viewport uploader