Compare commits

...

28 commits

Author SHA1 Message Date
e0cc704d99 Fix drawing terminator, round usec 2025-01-08 18:34:46 -08:00
7c04862d70 workaround for rounding us/iter
for some reason rounding is giving me wrong results
not sure what i'm doing wrong :D

just show 6 digits :P

ok this gets the us/iter working, and it is more stable
but the elapsed time still needs to be added
2025-01-05 14:29:27 -08:00
918d15e813 wip us/iter counter
seems wrong, gives 32 all the time and that seems too small
2025-01-05 14:05:24 -08:00
eaa00a055a wip changing time units
it does this weird thing where sometimes it's reading out wrong digits
and then switches to expected unit of sec/px

work in progress no clue what's going on
2025-01-04 18:46:51 -08:00
7e5ca79d9a move total_ms, total_pixels out of zero page
this frees up 12 bytes of zero page space and costs no measurable
time as these variables are not in the hot path and there was only
a tiny bit different.
2025-01-04 14:25:25 -08:00
d2bf77dc26 todo notes 2025-01-04 12:13:27 -08:00
582ddf497f apply jamey's suggestion of skipping add for high byte muls
rather than saving 0 into the high bytes, then adding the high-byte
multiplication later, write it directly in place. this saves a few
cycles on every iteration, and it adds up nicely.

View 1 overview render times:
130XE: 10.050 ms/px - 4m56s
800XL: 10.906 ms/px - 5m21s
2025-01-04 10:53:51 -08:00
d157fe1306 Faster pixel skipping on 4x4, 2x2 tiers
Iterate at fill_masks[fill_level]+1 instead of every pixel and then
skipping, saves a smidge of time

view 1 with expanded memory:
10.514 ms/px before
10.430 ms/px after
2025-01-04 10:06:12 -08:00
dcf5a3f59e sixth viewport 2025-01-01 21:15:38 -08:00
837082cf56 tweak viewports
skip experimental 6th viewport that got forgotten
and limit max zoom to 7 (range 0-7) which is what looks good
2025-01-01 15:45:26 -08:00
65fcb44934 3.13 / 6.26 gives nicer results! 2025-01-01 15:37:12 -08:00
c424f1b8bc fill in scanlines during tiering 2024-12-31 22:10:27 -08:00
49fe315529 'wide pixels'
should get better color on the composite video because the
scanlines will be fuller of data
2024-12-31 20:13:11 -08:00
f1ebb21bcb wip not working wide pixels 2024-12-31 17:49:13 -08:00
87caa52543 add viewport number 5 full zoom 2024-12-31 15:45:03 -08:00
d8601bb856 fix fix 2024-12-31 15:03:43 -08:00
7985ea9a39 fix panning for 32-bi 2024-12-31 14:45:38 -08:00
cc83c76706 update docs for 32-bit intermediates 2024-12-31 14:16:43 -08:00
2e8893fd78 haha fuck me 2024-12-31 13:54:53 -08:00
81bf7f3c43 tweak 2024-12-31 09:53:22 -08:00
1e0f577e09 wip 2024-12-31 09:09:11 -08:00
d2f41f9644 wip 2024-12-31 09:02:42 -08:00
2fcb30b76a wip 2024-12-31 08:56:59 -08:00
13257309dc init fix 2024-12-31 08:34:02 -08:00
7184b8e03f wip 2024-12-31 08:24:47 -08:00
4a1e35699a wip 2024-12-31 08:24:44 -08:00
0d086a179c wip 2024-12-31 08:23:04 -08:00
61eb1aaf21 notes 2024-12-31 05:11:26 -08:00
3 changed files with 353 additions and 206 deletions

539
mandel.s
View file

@ -1,43 +1,42 @@
; Our zero-page vars ; Our zero-page vars
sx = $80 ; i16: screen pixel x ox = $80 ; fixed6.26: center point x
sy = $82 ; i16: screen pixel y oy = $84 ; fixed6.26: center point y
ox = $84 ; fixed4.12: center point x cx = $88 ; fixed6.26: c_x
oy = $86 ; fixed4.12: center point y cy = $8c ; fixed6.26: c_y
cx = $88 ; fixed4.12: c_x
cy = $8a ; fixed4.12: c_y
zx = $8c ; fixed4.12: z_x
zy = $8e ; fixed4.12: z_y
zx_2 = $90 ; fixed4.12: z_x^2 zx = $90 ; fixed6.26: z_x
zy_2 = $92 ; fixed4.12: z_y^2 zy = $94 ; fixed6.26: z_y
zx_zy = $94 ; fixed4.12: z_x * z_y zx_2 = $98 ; fixed6.26: z_x^2
dist = $96 ; fixed4.12: z_x^2 + z_y^2 zy_2 = $9c ; fixed6.26: z_y^2
iter = $a0 ; u8: iteration count zx_zy = $a0 ; fixed6.26: z_x * z_y
dist = $a4 ; fixed6.26: z_x^2 + z_y^2
sx = $a8 ; i16: screen pixel x
sy = $aa ; i16: screen pixel y
z_buffer_active = $ac ; boolean: 1 if we triggered the lake, 0 if not
z_buffer_start = $ad ; u8: index into z_buffer
z_buffer_end = $ae ; u8: index into z_buffer
iter = $af ; u8: iteration count
zoom = $a1 ; u8: zoom shift level ptr = $b0 ; u16
count_frames = $a2 ; u8 pixel_ptr = $b2 ; u16
count_pixels = $a3 ; u8 zoom = $b4 ; u8: zoom shift level
total_ms = $a4 ; float48 fill_level = $b5 ; u8
total_pixels = $aa ; float48 pixel_color = $b6 ; u8
pixel_mask = $b7 ; u8
pixel_shift = $b8 ; u8
pixel_offset = $b9 ; u8
palette_offset = $ba ; u8
chroma_offset = $bb ; u8
palette_ticks = $bc ; u8
chroma_ticks = $bd ; u8
count_frames = $be ; u8
; free space $bf
z_buffer_active = $b0 ; boolean: 1 if we triggered the lake, 0 if not count_iters = $c0 ; u16
z_buffer_start = $b1 ; u8: index into z_buffer ; free space c2-cb
z_buffer_end = $b2 ; u8: index into z_buffer temp = $cc ; u16
temp = $b4 ; u16 temp2 = $ce ; u16
temp2 = $b6 ; u16
pixel_ptr = $b8 ; u16
pixel_color = $ba ; u8
pixel_mask = $bb ; u8
pixel_shift = $bc ; u8
pixel_offset = $bd ; u8
fill_level = $be ; u8
palette_offset = $bf ; u8
palette_ticks = $c0 ; u8
chroma_ticks = $c1 ; u8
chroma_offset = $c2 ; u8
ptr = $c4 ; u16
palette_delay = 23 palette_delay = 23
chroma_delay = 137 chroma_delay = 137
@ -60,10 +59,12 @@ LBUFF = $0580 ; result buffer for FASC routine
; FP ROM routine vectors ; FP ROM routine vectors
FASC = $D8E6 ; FLOATING POINT TO ASCII (output in INBUFF, last char has high bit set) FASC = $D8E6 ; FLOATING POINT TO ASCII (output in INBUFF, last char has high bit set)
IFP = $D9AA ; INTEGER TO FLOATING POINT CONVERSION (FR0:u16 -> FR0:float48) IFP = $D9AA ; INTEGER TO FLOATING POINT CONVERSION (FR0:u16 -> FR0:float48)
FPI = $D9D2 ; floating point to integer
FADD = $DA66 ; ADDITION (FR0 += FR1) FADD = $DA66 ; ADDITION (FR0 += FR1)
FSUB = $DA60 ; SUBTRACTION (FR0 -= FR1) FSUB = $DA60 ; SUBTRACTION (FR0 -= FR1)
FMUL = $DADB ; MULTIPLICATION (FR0 *= FR1) FMUL = $DADB ; MULTIPLICATION (FR0 *= FR1)
FDIV = $DB28 ; DIVISION (FR0 /= FR1) FDIV = $DB28 ; DIVISION (FR0 /= FR1)
ZFR0 = $DA44 ; clear FR0
ZF1 = $DA46 ; CLEAR ZERO PAGE FLOATING POINT NUMBER (XX) ZF1 = $DA46 ; CLEAR ZERO PAGE FLOATING POINT NUMBER (XX)
FLD0R = $DD89 ; LOAD FR0 WITH FLOATING POINT NUMBER (YYXX) FLD0R = $DD89 ; LOAD FR0 WITH FLOATING POINT NUMBER (YYXX)
FLD1R = $DD98 ; LOAD FR1 WITH FLOATING POINT NUMBER (YYXX) FLD1R = $DD98 ; LOAD FR1 WITH FLOATING POINT NUMBER (YYXX)
@ -142,7 +143,7 @@ str_self:
.byte "MANDEL-6502" .byte "MANDEL-6502"
str_self_end: str_self_end:
str_speed: str_speed:
.byte " ms/px" .byte "us/iter: "
str_speed_end: str_speed_end:
str_run: str_run:
.byte " RUN" .byte " RUN"
@ -190,20 +191,38 @@ aspect:
; ;
; 184h is the equiv of 220.8h at square pixels ; 184h is the equiv of 220.8h at square pixels
; 320 / 220.8 = 1.45 display aspect ratio ; 320 / 220.8 = 1.45 display aspect ratio
aspect_x: ; fixed4.16 5/4 aspect_x: ; fixed3.13 5/4
.word 5 << (12 - 2) .word 5 << (13 - 2)
aspect_y: ; fixed4.16 3/4 aspect_y: ; fixed3.13 3/4
.word 3 << (12 - 2) .word 3 << (13 - 2)
ms_per_frame: ; float48 16.66666667 sec_per_frame: ; float48 00 . 01 66 66 66 67
.byte 64 ; exponent/sign .byte 63 ; exponent/sign - -1 bytes
.byte $16 ; BCD digits .byte $01 ; BCD digits
.byte $66 .byte $66
.byte $66 .byte $66
.byte $66 .byte $66
.byte $67 .byte $67
us_per_sec: ; float48 1e9 01 00 0,0 00 . 00
.byte 67 ; exponent/sign +3 bytes
.byte $01 ; BCD digits
.byte $00
.byte $00
.byte $00
.byte $00
total_iters: ; float48
.repeat 6
.byte 0
.endrepeat
total_sec: ; float48
.repeat 6
.byte 0
.endrepeat
display_list_start: display_list_start:
; 24 lines overscan ; 24 lines overscan
.repeat 3 .repeat 3
@ -235,9 +254,9 @@ display_list_len = display_list_end - display_list_start
color_map: color_map:
.byte 0 .byte 0
.repeat 85 .repeat 85
.byte 1 .byte %01010101
.byte 2 .byte %10101010
.byte 3 .byte %11111111
.endrepeat .endrepeat
@ -286,23 +305,34 @@ fill_masks:
.byte %00000001 .byte %00000001
.byte %00000000 .byte %00000000
pixel_masks:
.byte %11111111
.byte %11110000
.byte %11000000
viewport_zoom: viewport_zoom:
.byte 1 .byte 0
.byte 6 .byte 5
.byte 8 .byte 7
.byte 6 .byte 5
.byte 7
.byte 7
viewport_ox: viewport_ox:
.word $0000 .dword ($00000000 & $3fffffff) << 2
.word $f110 .dword ($ff110000 & $3fffffff) << 2
.word $f110 .dword ($ff110000 & $3fffffff) << 2
.word $e400 .dword ($fe400000 & $3fffffff) << 2
.dword ($fe3b0000 & $3fffffff) << 2
.dword $fd220000
viewport_oy: viewport_oy:
.word $0000 .dword ($00000000 & $3fffffff) << 2
.word $fb60 .dword ($ffb60000 & $3fffffff) << 2
.word $fbe0 .dword ($ffbe0000 & $3fffffff) << 2
.word $0000 .dword ($00000000 & $3fffffff) << 2
.dword ($fffe0000 & $3fffffff) << 2
.dword $ff000000
; 2 + 9 * byte cycles ; 2 + 9 * byte cycles
.macro add bytes, dest, arg1, arg2 .macro add bytes, dest, arg1, arg2
@ -321,7 +351,7 @@ viewport_oy:
; 38 cycles ; 38 cycles
.macro add32 dest, arg1, arg2 .macro add32 dest, arg1, arg2
add 4, dest, arg2, dest add 4, dest, arg1, arg2
.endmacro .endmacro
; 8 cycles ; 8 cycles
@ -426,22 +456,25 @@ viewport_oy:
round16 arg ; 11-27 cycles round16 arg ; 11-27 cycles
.endmacro .endmacro
.macro imul16_round dest, arg1, arg2, shift ; input: arg1, arg2 as fixed4.12
; output: dest as fixed8.24
.macro imul16 dest, arg1, arg2
copy16 FR0, arg1 ; 12 cyc copy16 FR0, arg1 ; 12 cyc
copy16 FR1, arg2 ; 12 cyc copy16 FR1, arg2 ; 12 cyc
jsr imul16_func ; ? cyc jsr imul16_func ; ? cyc
shift_round_16 FR2, shift ; 103-119 cycles for shift=4 copy32 dest, FR2 ; 24 cyc
copy16 dest, FR2 + 2 ; 12 cyc
.endmacro .endmacro
.macro sqr16_round dest, arg, shift ; input: arg as fixed4.12
;imul16_round dest, arg, arg, shift ; output: dest as fixed8.24
.macro sqr16 dest, arg
copy16 FR0, arg ; 12 cyc copy16 FR0, arg ; 12 cyc
jsr sqr16_func ; ? cyc jsr sqr16_func ; ? cyc
shift_round_16 FR2, shift ; 103-119 cycles for shift=4 copy32 dest, FR2 ; 24 cyc
copy16 dest, FR2 + 2 ; 12 cyc
.endmacro .endmacro
; input: arg as u8
; output: dest as u16
; clobbers a, x ; clobbers a, x
.macro sqr8 dest, arg .macro sqr8 dest, arg
ldx arg ldx arg
@ -451,18 +484,6 @@ viewport_oy:
sta dest + 1 sta dest + 1
.endmacro .endmacro
; clobbers a, x
.macro sqr8_add16 dest, arg
ldx arg
clc
lda sqr_lobyte,x
adc dest
sta dest
lda sqr_hibyte,x
adc dest + 1
sta dest + 1
.endmacro
.segment "TABLES" .segment "TABLES"
; lookup table for top byte -> PORTB value for bank-switch ; lookup table for top byte -> PORTB value for bank-switch
.align 256 .align 256
@ -745,9 +766,8 @@ inner_loop:
; h1*h2*256*256 + h1*l2*256 + h2*l1*256 + l1*l2 ; h1*h2*256*256 + h1*l2*256 + h2*l1*256 + l1*l2
imul8 result, arg1, arg2, xe imul8 result, arg1, arg2, xe
lda #0
sta result + 2 imul8 result + 2, arg1 + 1, arg2 + 1, xe
sta result + 3
imul8 inter, arg1 + 1, arg2, xe imul8 inter, arg1 + 1, arg2, xe
add16 result + 1, result + 1, inter add16 result + 1, result + 1, inter
@ -757,9 +777,6 @@ inner_loop:
add16 result + 1, result + 1, inter add16 result + 1, result + 1, inter
add_carry result + 3 add_carry result + 3
imul8 inter, arg1 + 1, arg2 + 1, xe
add16 result + 2, result + 2, inter
; In case of negative inputs, adjust high word ; In case of negative inputs, adjust high word
; https://stackoverflow.com/a/28827013 ; https://stackoverflow.com/a/28827013
lda arg1 + 1 lda arg1 + 1
@ -792,9 +809,8 @@ arg2_pos:
; h*h*256*256 + h*l*256 + h*l*256 + l*l ; h*h*256*256 + h*l*256 + h*l*256 + l*l
sqr8 result, arg sqr8 result, arg
lda #0
sta result + 2 sqr8 result + 2, arg + 1
sta result + 3
imul8 inter, arg + 1, arg, xe imul8 inter, arg + 1, arg, xe
add16 result + 1, result + 1, inter add16 result + 1, result + 1, inter
@ -802,8 +818,6 @@ arg2_pos:
add16 result + 1, result + 1, inter add16 result + 1, result + 1, inter
add_carry result + 3 add_carry result + 3
sqr8_add16 result + 2, arg + 1
rts ; 6 cyc rts ; 6 cyc
.endscope .endscope
.endmacro .endmacro
@ -871,8 +885,8 @@ next:
.proc mandelbrot .proc mandelbrot
; input: ; input:
; cx: position scaled to 4.12 fixed point - -8..+7.9 ; cx: position scaled to 6.26 fixed point - -32..+31.9
; cy: position scaled to 4.12 ; cy: position scaled to 6.26
; ;
; output: ; output:
; iter: iteration count at escape or 0 ; iter: iteration count at escape or 0
@ -884,16 +898,50 @@ next:
; zx_zy = 0 ; zx_zy = 0
; dist = 0 ; dist = 0
; iter = 0 ; iter = 0
; lda #00
; ldx #(iter - zx + 1)
;initloop:
; sta zx - 1,x
; dex
; bne initloop
; sta z_buffer_start
; sta z_buffer_end
lda #00 lda #00
ldx #(iter - zx + 1) sta zx
initloop: sta zx + 1
sta zx - 1,x sta zx + 2
dex sta zx + 3
bne initloop sta zy
sta zy + 1
sta zy + 2
sta zy + 3
sta zx_2
sta zx_2 + 1
sta zx_2 + 2
sta zx_2 + 3
sta zy_2
sta zy_2 + 1
sta zy_2 + 2
sta zy_2 + 3
sta zx_zy
sta zx_zy + 1
sta zx_zy + 2
sta zx_zy + 3
sta dist
sta dist + 1
sta dist + 2
sta dist + 3
sta iter
sta z_buffer_start sta z_buffer_start
sta z_buffer_end sta z_buffer_end
loop: loop:
inc count_iters
bne low_iters
inc count_iters + 1
low_iters:
; iter++ & max-iters break ; iter++ & max-iters break
inc iter inc iter
bne keep_going bne keep_going
@ -901,6 +949,8 @@ loop:
keep_going: keep_going:
.macro quick_exit arg, max .macro quick_exit arg, max
; arg: fixed6.26
; max: integer
.local positive .local positive
.local negative .local negative
.local nope_out .local nope_out
@ -908,16 +958,16 @@ keep_going:
.local all_done .local all_done
; check sign bit ; check sign bit
lda arg + 1 lda arg + 3
bmi negative bmi negative
positive: positive:
cmp #((max) << 4) cmp #(max << 2)
bmi all_done ; 'less than' bmi all_done ; 'less than'
jmp exit_path jmp exit_path
negative: negative:
cmp #(256 - ((max) << 4)) cmp #(256 - (max << 2))
beq first_equal ; 'equal' on first byte beq first_equal ; 'equal' on first byte
bpl all_done ; 'greater than' bpl all_done ; 'greater than'
@ -925,34 +975,44 @@ keep_going:
jmp exit_path jmp exit_path
first_equal: first_equal:
; following bytes all 0 shows it's really 'equal'
lda arg + 2
bne all_done
lda arg + 1
bne all_done
lda arg lda arg
beq nope_out ; 2nd byte 0 shows it's really 'equal' bne all_done
jmp exit_path
all_done: all_done:
.endmacro .endmacro
; 4.12: (-8 .. +7.9) ; 6.26: (-32 .. 31.9)
; zx = zx_2 - zy_2 + cx ; zx = zx_2 - zy_2 + cx
sub16 zx, zx_2, zy_2 sub32 zx, zx_2, zy_2
add16 zx, zx, cx add32 zx, zx, cx
quick_exit zx, 2 quick_exit zx, 2
; zy = zx_zy + zx_zy + cy ; zy = zx_zy + zx_zy + cy
add16 zy, zx_zy, zx_zy add32 zy, zx_zy, zx_zy
add16 zy, zy, cy add32 zy, zy, cy
quick_exit zy, 2 quick_exit zy, 2
; convert 6.26 -> 3.13: (-4 .. +3.9)
shift_round_16 zx, 3
shift_round_16 zy, 3
; zx_2 = zx * zx ; zx_2 = zx * zx
sqr16_round zx_2, zx, 4 sqr16 zx_2, zx + 2
; zy_2 = zy * zy ; zy_2 = zy * zy
sqr16_round zy_2, zy, 4 sqr16 zy_2, zy + 2
; zx_zy = zx * zy ; zx_zy = zx * zy
imul16_round zx_zy, zx, zy, 4 imul16 zx_zy, zx + 2, zy + 2
; dist = zx_2 + zy_2 ; dist = zx_2 + zy_2
add16 dist, zx_2, zy_2 add32 dist, zx_2, zy_2
quick_exit dist, 4 quick_exit dist, 4
; if may be in the lake, look for looping output with a small buffer ; if may be in the lake, look for looping output with a small buffer
@ -989,10 +1049,10 @@ z_buffer_loop:
; Compare the previously stored z values ; Compare the previously stored z values
ldy #0 ldy #0
z_compare zx z_compare zx + 2
z_compare zx + 1 z_compare zx + 3
z_compare zy z_compare zy + 2
z_compare zy + 1 z_compare zy + 3
cpy #4 cpy #4
bne z_no_matches bne z_no_matches
@ -1007,10 +1067,10 @@ z_no_matches:
z_nothing_to_read: z_nothing_to_read:
; Store and expand ; Store and expand
z_store zx z_store zx + 2
z_store zx + 1 z_store zx + 3
z_store zy z_store zy + 2
z_store zy + 1 z_store zy + 3
z_advance z_advance
stx z_buffer_end stx z_buffer_end
@ -1061,14 +1121,17 @@ cont:
enough: enough:
.endmacro .endmacro
.macro zoom_factor dest, src, zoom, aspect .macro zoom_factor dest, src, aspect
; output: dest: fixed6.26
; input: src: fixed3.13
; aspect: fixed3.13
; clobbers A, X, flags, etc ; clobbers A, X, flags, etc
copy16 dest, src copy16 dest, src
scale_zoom dest scale_zoom dest
; cy = cy * (3 / 4) ; cy = cy * (3 / 4)
; cx = cx * (5 / 4) ; cx = cx * (5 / 4)
imul16_round dest, dest, aspect, 4 imul16 dest, dest, aspect
.endmacro .endmacro
.proc pset .proc pset
@ -1079,8 +1142,11 @@ enough:
; iter -> color ; iter -> color
ldx iter ldx iter
lda color_map,x lda color_map,x
ldx fill_level
and pixel_masks,x
sta pixel_color sta pixel_color
lda #(255 - 3) lda pixel_masks,x
eor #$ff
sta pixel_mask sta pixel_mask
; sy -> line base address in temp ; sy -> line base address in temp
@ -1129,22 +1195,23 @@ point:
; pixel_mask <<= pixel_shift (shifting in ones) ; pixel_mask <<= pixel_shift (shifting in ones)
and #3 and #3
sta pixel_shift sta pixel_shift
lda #3
sec
sbc pixel_shift
tax tax
shift_loop: shift_loop:
beq shift_done beq shift_done
asl pixel_color lsr pixel_color
asl pixel_color lsr pixel_color
sec sec
rol pixel_mask ror pixel_mask
sec sec
rol pixel_mask ror pixel_mask
dex dex
jmp shift_loop jmp shift_loop
shift_done: shift_done:
ldy fill_level
ldx fill_masks,y
inx
; pixel_offset = temp >> 2 ; pixel_offset = temp >> 2
lda temp lda temp
lsr a lsr a
@ -1152,12 +1219,25 @@ shift_done:
sta pixel_offset sta pixel_offset
tay tay
draw_pixel:
; read, mask, or, write ; read, mask, or, write
lda (pixel_ptr),y lda (pixel_ptr),y
and pixel_mask and pixel_mask
ora pixel_color ora pixel_color
sta (pixel_ptr),y sta (pixel_ptr),y
dex
beq done
clc
lda #40
adc pixel_ptr
sta pixel_ptr
lda #0
adc pixel_ptr + 1
sta pixel_ptr + 1
jmp draw_pixel
done:
rts rts
.endproc .endproc
@ -1165,6 +1245,7 @@ shift_done:
; clobbers A, X ; clobbers A, X
.local loop .local loop
.local done .local done
.local padding
ldx #0 ldx #0
loop: loop:
cpx #len cpx #len
@ -1172,11 +1253,27 @@ loop:
txa txa
tay tay
lda (strptr),y lda (strptr),y
pha ; save the char for terminator check
and #$7f ; strip the high bit (terminator)
tay tay
lda char_map,y lda char_map,y
sta textbuffer + col,x sta textbuffer + col,x
inx inx
pla
bmi padding
jmp loop jmp loop
padding:
ldy #32 ; space
lda char_map,y
cpx #len
beq done
sta textbuffer + col,x
inx
jmp padding
done: done:
.endmacro .endmacro
@ -1293,12 +1390,15 @@ skip_luma:
cpy #KEY_MINUS cpy #KEY_MINUS
beq minus beq minus
; temp = $0010 << (8 - zoom) ; temp+temp2 = $00010000 << (8 - zoom)
lda #$10
sta temp
lda #$00 lda #$00
sta temp
sta temp + 1 sta temp + 1
scale_zoom temp lda #$01
sta temp + 2
lda #$00
sta temp + 3
scale_zoom temp + 2
cpy #KEY_UP cpy #KEY_UP
beq up beq up
@ -1308,14 +1408,7 @@ skip_luma:
beq left beq left
cpy #KEY_RIGHT cpy #KEY_RIGHT
beq right beq right
cpy #KEY_1 jmp number_keys
beq one
cpy #KEY_2
beq two
cpy #KEY_3
beq three
cpy #KEY_4
beq four
skip_char: skip_char:
lda #0 lda #0
@ -1323,7 +1416,7 @@ skip_char:
plus: plus:
lda zoom lda zoom
cmp #8 cmp #7
bpl skip_char bpl skip_char
inc zoom inc zoom
jmp done jmp done
@ -1334,17 +1427,33 @@ minus:
dec zoom dec zoom
jmp done jmp done
up: up:
sub16 oy, oy, temp sub32 oy, oy, temp
jmp done jmp done
down: down:
add16 oy, oy, temp add32 oy, oy, temp
jmp done jmp done
left: left:
sub16 ox, ox, temp sub32 ox, ox, temp
jmp done jmp done
right: right:
add16 ox, ox, temp add32 ox, ox, temp
jmp done jmp done
number_keys:
cpy #KEY_1
beq one
cpy #KEY_2
beq two
cpy #KEY_3
beq three
cpy #KEY_4
beq four
cpy #KEY_5
beq five
cpy #KEY_6
beq six
jmp skip_char
one: one:
ldx #0 ldx #0
jmp load_key_viewport jmp load_key_viewport
@ -1356,6 +1465,12 @@ three:
jmp load_key_viewport jmp load_key_viewport
four: four:
ldx #3 ldx #3
jmp load_key_viewport
five:
ldx #4
jmp load_key_viewport
six:
ldx #5
; fall through ; fall through
load_key_viewport: load_key_viewport:
jsr load_viewport jsr load_viewport
@ -1406,17 +1521,32 @@ zero_byte_loop:
txa txa
asl a asl a
asl a
tax tax
lda viewport_ox,x lda viewport_ox,x
sta ox sta ox
lda viewport_oy,x lda viewport_oy,x
sta oy sta oy
inx inx
lda viewport_ox,x lda viewport_ox,x
sta ox + 1 sta ox + 1
lda viewport_oy,x lda viewport_oy,x
sta oy + 1 sta oy + 1
inx
lda viewport_ox,x
sta ox + 2
lda viewport_oy,x
sta oy + 2
inx
lda viewport_ox,x
sta ox + 3
lda viewport_oy,x
sta oy + 3
rts rts
.endproc .endproc
@ -1471,16 +1601,20 @@ copy_byte_loop:
jsr SETVBV jsr SETVBV
main_loop: main_loop:
; count_frames = 0; count_pixels = 0 ; count_frames = 0; count_iters = 0
lda #0 lda #0
sta count_frames sta count_frames
sta count_pixels sta count_iters
sta count_iters + 1
; total_ms = 0.0; total_pixels = 0.0 ; total_sec = 0.0; total_iters = 0.0
ldx #total_ms jsr ZFR0
jsr ZF1 ldx #.lobyte(total_sec)
ldx #total_pixels ldy #.hibyte(total_sec)
jsr ZF1 jsr FST0R
ldx #.lobyte(total_iters)
ldy #.hibyte(total_iters)
jsr FST0R
jsr clear_screen jsr clear_screen
jsr status_bar jsr status_bar
@ -1538,10 +1672,10 @@ skipped_mask:
not_skipped_mask: not_skipped_mask:
; run the fractal! ; run the fractal!
zoom_factor cx, sx, zoom, aspect_x zoom_factor cx, sx, aspect_x
add16 cx, cx, ox add32 cx, cx, ox
zoom_factor cy, sy, zoom, aspect_y zoom_factor cy, sy, aspect_y
add16 cy, cy, oy add32 cy, cy, oy
jsr mandelbrot jsr mandelbrot
jsr pset jsr pset
@ -1552,38 +1686,32 @@ not_skipped_mask:
no_key: no_key:
; check if we should update the counters ; check if we should update the counters
;
; count_pixels >= width? update!
inc count_pixels
lda count_pixels
cmp #width
bmi update_status
; count_frames >= 120? update! ; count_frames >= 120? update!
lda count_frames lda count_frames
cmp #120 ; >= 2 seconds cmp #120 ; >= 2 seconds
bmi skipped bpl update_status
jmp skipped
update_status: update_status:
; FR0 = (float)count_pixels & clear count_pixels ; FR0 = (float)count_iters & clear count_iters
lda count_pixels copy16 FR0, count_iters
sta FR0
lda #0
sta FR0 + 1
sta count_pixels
jsr IFP jsr IFP
lda #0
sta count_iters
sta count_iters + 1
; FR1 = total_pixels ; FR1 = total_iters
ldx #.lobyte(total_pixels) ldx #.lobyte(total_iters)
ldy #.hibyte(total_pixels) ldy #.hibyte(total_iters)
jsr FLD1R jsr FLD1R
; FR0 += FR1 ; FR0 += FR1
jsr FADD jsr FADD
; total_pixels = FR0 ; total_iters = FR0
ldx #.lobyte(total_pixels) ldx #.lobyte(total_iters)
ldy #.hibyte(total_pixels) ldy #.hibyte(total_iters)
jsr FST0R jsr FST0R
@ -1596,44 +1724,58 @@ update_status:
sta count_frames sta count_frames
jsr IFP jsr IFP
; FR0 *= ms_per_frame ; FR0 *= sec_per_frame
ldx #.lobyte(ms_per_frame) ldx #.lobyte(sec_per_frame)
ldy #.hibyte(ms_per_frame) ldy #.hibyte(sec_per_frame)
jsr FLD1R jsr FLD1R
jsr FMUL jsr FMUL
; FR0 += total_ms ; FR0 += total_sec
ldx #total_ms ldx #.lobyte(total_sec)
ldy #0 ldy #.hibyte(total_sec)
jsr FLD1R jsr FLD1R
jsr FADD jsr FADD
; total_ms = FR0 ; total_sec = FR0
ldx #total_ms ldx #.lobyte(total_sec)
ldy #0 ldy #.hibyte(total_sec)
jsr FST0R jsr FST0R
; FR0 /= total_pixels ; FR0 /= total_iters
ldx #total_pixels ldx #.lobyte(total_iters)
ldy #0 ldy #.hibyte(total_iters)
jsr FLD1R jsr FLD1R
jsr FDIV jsr FDIV
; FR0 *= us_per_sec
ldx #.lobyte(us_per_sec)
ldy #.hibyte(us_per_sec)
jsr FLD1R
jsr FMUL
; round (down) to integer
jsr FPI
clc
jsr IFP
; convert to ASCII in INBUFF ; convert to ASCII in INBUFF
jsr FASC jsr FASC
; print the first 6 digits ; print the first 6 digits
draw_text_indirect speed_start, speed_precision, INBUFF draw_text speed_start, str_speed_len, str_speed
draw_text speed_start + speed_precision, str_speed_len, str_speed draw_text_indirect speed_start + str_speed_len, speed_precision, INBUFF
skipped: skipped:
; sx += fill_level[fill_masks] + 1
ldx fill_level
lda fill_masks,x
clc clc
lda sx adc #1 ; will never carry
adc #1 adc sx
sta sx sta sx
lda sx + 1 lda #0
adc #0 adc sx + 1
sta sx + 1 sta sx + 1
lda sx lda sx
@ -1643,12 +1785,15 @@ skipped:
loop_sx_done: loop_sx_done:
; sy += fill_level[fill_masks] + 1
ldx fill_level
lda fill_masks,x
clc clc
lda sy adc #1 ; will never carry
adc #1 adc sy
sta sy sta sy
lda sy + 1 lda #0
adc #0 adc sy + 1
sta sy + 1 sta sy + 1
lda sy lda sy

View file

@ -18,7 +18,7 @@ Enjoy! I'll probably work on this off and on for the next few weeks until I've g
## Current state ## Current state
Basic rendering is functional, with interactive zoom/pan (+/-/arrows) and 4 preset viewports via the number keys. Basic rendering is functional, with interactive zoom/pan (+/-/arrows) and 6 preset viewports via the number keys.
The 16-bit signed integer multiplication takes two 16-bit inputs and emits one 32-bit output in the zero page, using the Atari OS ROM's floating point registers as workspaces. Inputs are clobbered. The 16-bit signed integer multiplication takes two 16-bit inputs and emits one 32-bit output in the zero page, using the Atari OS ROM's floating point registers as workspaces. Inputs are clobbered.
@ -27,7 +27,7 @@ The 16-bit signed integer multiplication takes two 16-bit inputs and emits one 3
* when expanded RAM is available as on 130XE, a 64KB 8-bit multiplication table accelerates the remaining multiplications * when expanded RAM is available as on 130XE, a 64KB 8-bit multiplication table accelerates the remaining multiplications
* without expanded RAM, a table of half-squares is used to implement the algorithm from https://everything2.com/title/Fast+6502+multiplication * without expanded RAM, a table of half-squares is used to implement the algorithm from https://everything2.com/title/Fast+6502+multiplication
The mandelbrot calculations are done using 4.12-precision fixed point numbers. It may be possible to squish this down to 3.13. The mandelbrot calculations are done using 3.13-precision fixed point numbers with 6.26-precision intermediates.
Iterations are capped at 255. Iterations are capped at 255.

12
todo.md
View file

@ -1,15 +1,17 @@
things to try: things to try:
* fix status bar to show elapsed time, per-iter time, per-pixel iter count
* 'turbo' mode disabling graphics in full or part
* patch the entire expanded-ram imul8xe on top of imul8 to avoid the 3-cycle thunk penalty :D * patch the entire expanded-ram imul8xe on top of imul8 to avoid the 3-cycle thunk penalty :D
* try 3.13 fixed point instead of 4.12 for more precision * maybe clean up the load/layout of the big mul table
* can we get away without the extra bit?
* consider alternate lookup tables in the top 16KB under ROM
* y-axis mirror optimization * y-axis mirror optimization
* 'wide pixels' 2x and 4x for a fuller initial image in the tiered rendering
* maybe redo tiering to just 4x4, 2x2, 1x1?
* extract viewport for display & re-input via keyboard * extract viewport for display & re-input via keyboard
* fujinet screenshot/viewport uploader * fujinet screenshot/viewport uploader