Compare commits

...

21 commits
main ... main

Author SHA1 Message Date
89b4e45901 flip the y coordinate sign 2025-02-22 20:24:04 -08:00
6e66145ec6 whoops fixes 2025-02-22 15:37:11 -08:00
07db3d00d7 second status bar display with coords/zoom
currently using 3.13 precision to output to floats for formatting
2025-02-22 11:23:13 -08:00
26d612b6f3 move 8 scan lines on the bottom to status bar 2025-02-21 19:42:10 -08:00
25da81c64b clean up text draw, fix offset by one 2025-02-02 16:40:58 -08:00
d182d33b35 draw_string 2025-02-01 10:02:01 -08:00
e0cc704d99 Fix drawing terminator, round usec 2025-01-08 18:34:46 -08:00
7c04862d70 workaround for rounding us/iter
for some reason rounding is giving me wrong results
not sure what i'm doing wrong :D

just show 6 digits :P

ok this gets the us/iter working, and it is more stable
but the elapsed time still needs to be added
2025-01-05 14:29:27 -08:00
918d15e813 wip us/iter counter
seems wrong, gives 32 all the time and that seems too small
2025-01-05 14:05:24 -08:00
eaa00a055a wip changing time units
it does this weird thing where sometimes it's reading out wrong digits
and then switches to expected unit of sec/px

work in progress no clue what's going on
2025-01-04 18:46:51 -08:00
7e5ca79d9a move total_ms, total_pixels out of zero page
this frees up 12 bytes of zero page space and costs no measurable
time as these variables are not in the hot path and there was only
a tiny bit different.
2025-01-04 14:25:25 -08:00
d2bf77dc26 todo notes 2025-01-04 12:13:27 -08:00
582ddf497f apply jamey's suggestion of skipping add for high byte muls
rather than saving 0 into the high bytes, then adding the high-byte
multiplication later, write it directly in place. this saves a few
cycles on every iteration, and it adds up nicely.

View 1 overview render times:
130XE: 10.050 ms/px - 4m56s
800XL: 10.906 ms/px - 5m21s
2025-01-04 10:53:51 -08:00
d157fe1306 Faster pixel skipping on 4x4, 2x2 tiers
Iterate at fill_masks[fill_level]+1 instead of every pixel and then
skipping, saves a smidge of time

view 1 with expanded memory:
10.514 ms/px before
10.430 ms/px after
2025-01-04 10:06:12 -08:00
dcf5a3f59e sixth viewport 2025-01-01 21:15:38 -08:00
837082cf56 tweak viewports
skip experimental 6th viewport that got forgotten
and limit max zoom to 7 (range 0-7) which is what looks good
2025-01-01 15:45:26 -08:00
65fcb44934 3.13 / 6.26 gives nicer results! 2025-01-01 15:37:12 -08:00
c424f1b8bc fill in scanlines during tiering 2024-12-31 22:10:27 -08:00
49fe315529 'wide pixels'
should get better color on the composite video because the
scanlines will be fuller of data
2024-12-31 20:13:11 -08:00
f1ebb21bcb wip not working wide pixels 2024-12-31 17:49:13 -08:00
87caa52543 add viewport number 5 full zoom 2024-12-31 15:45:03 -08:00
3 changed files with 448 additions and 182 deletions

612
mandel.s
View file

@ -1,16 +1,16 @@
; Our zero-page vars
ox = $80 ; fixed8.24: center point x
oy = $84 ; fixed8.24: center point y
cx = $88 ; fixed8.24: c_x
cy = $8c ; fixed8.24: c_y
ox = $80 ; fixed6.26: center point x
oy = $84 ; fixed6.26: center point y
cx = $88 ; fixed6.26: c_x
cy = $8c ; fixed6.26: c_y
zx = $90 ; fixed8.24: z_x
zy = $94 ; fixed8.24: z_y
zx_2 = $98 ; fixed8.24: z_x^2
zy_2 = $9c ; fixed8.24: z_y^2
zx = $90 ; fixed6.26: z_x
zy = $94 ; fixed6.26: z_y
zx_2 = $98 ; fixed6.26: z_x^2
zy_2 = $9c ; fixed6.26: z_y^2
zx_zy = $a0 ; fixed8.24: z_x * z_y
dist = $a4 ; fixed8.24: z_x^2 + z_y^2
zx_zy = $a0 ; fixed6.26: z_x * z_y
dist = $a4 ; fixed6.26: z_x^2 + z_y^2
sx = $a8 ; i16: screen pixel x
sy = $aa ; i16: screen pixel y
z_buffer_active = $ac ; boolean: 1 if we triggered the lake, 0 if not
@ -31,10 +31,12 @@ chroma_offset = $bb ; u8
palette_ticks = $bc ; u8
chroma_ticks = $bd ; u8
count_frames = $be ; u8
count_pixels = $bf ; u8
; free space $bf
total_pixels = $c0 ; float48
total_ms = $c6 ; float48
count_iters = $c0 ; u16
text_col = $c2 ; u8
text_row = $c3 ; u8
; free space c4-cb
temp = $cc ; u16
temp2 = $ce ; u16
@ -59,10 +61,12 @@ LBUFF = $0580 ; result buffer for FASC routine
; FP ROM routine vectors
FASC = $D8E6 ; FLOATING POINT TO ASCII (output in INBUFF, last char has high bit set)
IFP = $D9AA ; INTEGER TO FLOATING POINT CONVERSION (FR0:u16 -> FR0:float48)
FPI = $D9D2 ; floating point to integer
FADD = $DA66 ; ADDITION (FR0 += FR1)
FSUB = $DA60 ; SUBTRACTION (FR0 -= FR1)
FMUL = $DADB ; MULTIPLICATION (FR0 *= FR1)
FDIV = $DB28 ; DIVISION (FR0 /= FR1)
ZFR0 = $DA44 ; clear FR0
ZF1 = $DA46 ; CLEAR ZERO PAGE FLOATING POINT NUMBER (XX)
FLD0R = $DD89 ; LOAD FR0 WITH FLOATING POINT NUMBER (YYXX)
FLD1R = $DD98 ; LOAD FR1 WITH FLOATING POINT NUMBER (YYXX)
@ -76,7 +80,7 @@ framebuffer_bottom = $b000
display_list = $bf00
framebuffer_end = $c000
height = 184
height = 176
half_height = height >> 1
width = 160
half_width = width >> 1
@ -140,25 +144,52 @@ strings:
str_self:
.byte "MANDEL-6502"
str_self_end:
.byte 0
str_speed:
.byte " ms/px"
.byte "us/iter: "
str_speed_end:
.byte 0
str_run:
.byte " RUN"
str_run_end:
.byte 0
str_done:
.byte "DONE"
str_done_end:
.byte 0
str_padding:
.byte " "
str_padding_end:
.byte 0
str_self_len = str_self_end - str_self
str_speed_len = str_speed_end - str_speed
str_run_len = str_run_end - str_run
str_done_len = str_done_end - str_done
speed_precision = 6
str_padding_len = str_padding_end - str_padding
speed_start = 40 - str_done_len - str_speed_len - speed_precision - 1
speed_len = 14 + str_speed_len
speed_start = 40 - str_done_len - str_speed_len - str_padding_len - 1
col_x = 1
str_x:
.byte "X:"
.byte 0
str_x_len = 2
str_x_space = 12
str_x_padding = 2
col_y = col_x + str_x_len + str_x_space + str_x_padding
str_y:
.byte "Y:"
.byte 0
str_y_len = 2
str_y_space = 12
str_y_padding = 2
col_zoom = col_y + str_y_len + str_y_space + str_y_padding
str_zoom:
.byte "ZOOM:"
.byte 0
str_zoom_len = 5
char_map:
; Map ATASCII string values to framebuffer font entries
@ -189,20 +220,49 @@ aspect:
;
; 184h is the equiv of 220.8h at square pixels
; 320 / 220.8 = 1.45 display aspect ratio
aspect_x: ; fixed4.16 5/4
.word 5 << (12 - 2)
aspect_x: ; fixed3.13 5/4
.word 5 << (13 - 2)
aspect_y: ; fixed4.16 3/4
.word 3 << (12 - 2)
aspect_y: ; fixed3.13 3/4
.word 3 << (13 - 2)
ms_per_frame: ; float48 16.66666667
.byte 64 ; exponent/sign
.byte $16 ; BCD digits
fixed3_13_as_float: ; float48
; 1 << 13
; 8192
; 81 92 . 00 00 00
.byte 65 ; exponent/sign - +1 byte
.byte $81
.byte $92
.byte $00
.byte $00
.byte $00
sec_per_frame: ; float48 00 . 01 66 66 66 67
.byte 63 ; exponent/sign - -1 bytes
.byte $01 ; BCD digits
.byte $66
.byte $66
.byte $66
.byte $67
us_per_sec: ; float48 1e9 01 00 0,0 00 . 00
.byte 67 ; exponent/sign +3 bytes
.byte $01 ; BCD digits
.byte $00
.byte $00
.byte $00
.byte $00
total_iters: ; float48
.repeat 6
.byte 0
.endrepeat
total_sec: ; float48
.repeat 6
.byte 0
.endrepeat
display_list_start:
; 24 lines overscan
.repeat 3
@ -226,6 +286,10 @@ display_list_start:
.byte $0e
.endrep
; 8 scan lines, 1 row of 40-column text
.byte $42
.addr textbuffer + 40
.byte $41 ; jump and blank
.addr display_list
display_list_end:
@ -234,9 +298,9 @@ display_list_len = display_list_end - display_list_start
color_map:
.byte 0
.repeat 85
.byte 1
.byte 2
.byte 3
.byte %01010101
.byte %10101010
.byte %11111111
.endrepeat
@ -285,23 +349,34 @@ fill_masks:
.byte %00000001
.byte %00000000
pixel_masks:
.byte %11111111
.byte %11110000
.byte %11000000
viewport_zoom:
.byte 1
.byte 6
.byte 8
.byte 6
.byte 0
.byte 5
.byte 7
.byte 5
.byte 7
.byte 7
viewport_ox:
.dword $00000000
.dword $ff110000
.dword $ff110000
.dword $fe400000
.dword ($00000000 & $3fffffff) << 2
.dword ($ff110000 & $3fffffff) << 2
.dword ($ff110000 & $3fffffff) << 2
.dword ($fe400000 & $3fffffff) << 2
.dword ($fe3b0000 & $3fffffff) << 2
.dword $fd220000
viewport_oy:
.dword $00000000
.dword $ffb60000
.dword $ffbe0000
.dword $00000000
.dword ($00000000 & $3fffffff) << 2
.dword ($ffb60000 & $3fffffff) << 2
.dword ($ffbe0000 & $3fffffff) << 2
.dword ($00000000 & $3fffffff) << 2
.dword ($fffe0000 & $3fffffff) << 2
.dword $ff000000
; 2 + 9 * byte cycles
.macro add bytes, dest, arg1, arg2
@ -453,20 +528,6 @@ viewport_oy:
sta dest + 1
.endmacro
; input: arg as u8
; input/output: dest as u16
; clobbers a, x
.macro sqr8_add16 dest, arg
ldx arg
clc
lda sqr_lobyte,x
adc dest
sta dest
lda sqr_hibyte,x
adc dest + 1
sta dest + 1
.endmacro
.segment "TABLES"
; lookup table for top byte -> PORTB value for bank-switch
.align 256
@ -749,9 +810,8 @@ inner_loop:
; h1*h2*256*256 + h1*l2*256 + h2*l1*256 + l1*l2
imul8 result, arg1, arg2, xe
lda #0
sta result + 2
sta result + 3
imul8 result + 2, arg1 + 1, arg2 + 1, xe
imul8 inter, arg1 + 1, arg2, xe
add16 result + 1, result + 1, inter
@ -761,9 +821,6 @@ inner_loop:
add16 result + 1, result + 1, inter
add_carry result + 3
imul8 inter, arg1 + 1, arg2 + 1, xe
add16 result + 2, result + 2, inter
; In case of negative inputs, adjust high word
; https://stackoverflow.com/a/28827013
lda arg1 + 1
@ -796,9 +853,8 @@ arg2_pos:
; h*h*256*256 + h*l*256 + h*l*256 + l*l
sqr8 result, arg
lda #0
sta result + 2
sta result + 3
sqr8 result + 2, arg + 1
imul8 inter, arg + 1, arg, xe
add16 result + 1, result + 1, inter
@ -806,8 +862,6 @@ arg2_pos:
add16 result + 1, result + 1, inter
add_carry result + 3
sqr8_add16 result + 2, arg + 1
rts ; 6 cyc
.endscope
.endmacro
@ -873,10 +927,72 @@ next:
.endmacro
; input in FR0, 16 bits signed 3.13 fixed
; output in FR0, Atari float
; clobbers a, x, y, FR0, FR1
.proc fixed3_13_to_float
ldx #.lobyte(fixed3_13_as_float)
ldy #.hibyte(fixed3_13_as_float)
jsr FLD1R
; check sign bit! conversion routine is for unsigned
lda FR0 + 1
bpl positive
negative:
neg16 FR0
jsr IFP
; set float sign bit
lda FR0
ora #$80
sta FR0
jmp common
positive:
jsr IFP
common:
jsr FDIV
rts
.endproc
; input in FR0, Atari float
; output in FR0, 16 bits signed 3.13 fixed
; clobbers a, x, y, FR0, FR1
.proc float_to_fixed3_13
ldx #.lobyte(fixed3_13_as_float)
ldy #.hibyte(fixed3_13_as_float)
jsr FLD1R
jsr FMUL
; check sign bit! conversion routine is for unsigned
lda FR0
bcc positive
negative:
; clearfloat sign bit
lda FR0
eor #$80
sta FR0
jsr FPI
neg16 FR0
jmp common
positive:
jsr FPI
common:
rts
.endproc
.proc mandelbrot
; input:
; cx: position scaled to 8.24 fixed point - -128..+127.9
; cy: position scaled to 8.24
; cx: position scaled to 6.26 fixed point - -32..+31.9
; cy: position scaled to 6.26
;
; output:
; iter: iteration count at escape or 0
@ -927,6 +1043,11 @@ next:
sta z_buffer_end
loop:
inc count_iters
bne low_iters
inc count_iters + 1
low_iters:
; iter++ & max-iters break
inc iter
bne keep_going
@ -934,7 +1055,7 @@ loop:
keep_going:
.macro quick_exit arg, max
; arg: fixed8.24
; arg: fixed6.26
; max: integer
.local positive
.local negative
@ -947,12 +1068,12 @@ keep_going:
bmi negative
positive:
cmp #max
cmp #(max << 2)
bmi all_done ; 'less than'
jmp exit_path
negative:
cmp #(256 - max)
cmp #(256 - (max << 2))
beq first_equal ; 'equal' on first byte
bpl all_done ; 'greater than'
@ -972,7 +1093,7 @@ keep_going:
all_done:
.endmacro
; 8.24: (-128 .. 127.9)
; 6.26: (-32 .. 31.9)
; zx = zx_2 - zy_2 + cx
sub32 zx, zx_2, zy_2
add32 zx, zx, cx
@ -983,9 +1104,9 @@ keep_going:
add32 zy, zy, cy
quick_exit zy, 2
; convert 8.24 -> 4.12: (-8 .. +7.9)
shift_round_16 zx, 4
shift_round_16 zy, 4
; convert 6.26 -> 3.13: (-4 .. +3.9)
shift_round_16 zx, 3
shift_round_16 zy, 3
; zx_2 = zx * zx
sqr16 zx_2, zx + 2
@ -1107,9 +1228,9 @@ enough:
.endmacro
.macro zoom_factor dest, src, aspect
; output: dest: fixed8.24
; input: src: fixed4.12
; aspect: fixed4.12
; output: dest: fixed6.26
; input: src: fixed3.13
; aspect: fixed3.13
; clobbers A, X, flags, etc
copy16 dest, src
scale_zoom dest
@ -1127,8 +1248,11 @@ enough:
; iter -> color
ldx iter
lda color_map,x
ldx fill_level
and pixel_masks,x
sta pixel_color
lda #(255 - 3)
lda pixel_masks,x
eor #$ff
sta pixel_mask
; sy -> line base address in temp
@ -1177,22 +1301,23 @@ point:
; pixel_mask <<= pixel_shift (shifting in ones)
and #3
sta pixel_shift
lda #3
sec
sbc pixel_shift
tax
shift_loop:
beq shift_done
asl pixel_color
asl pixel_color
lsr pixel_color
lsr pixel_color
sec
rol pixel_mask
ror pixel_mask
sec
rol pixel_mask
ror pixel_mask
dex
jmp shift_loop
shift_done:
ldy fill_level
ldx fill_masks,y
inx
; pixel_offset = temp >> 2
lda temp
lsr a
@ -1200,48 +1325,94 @@ shift_done:
sta pixel_offset
tay
draw_pixel:
; read, mask, or, write
lda (pixel_ptr),y
and pixel_mask
ora pixel_color
sta (pixel_ptr),y
dex
beq done
clc
lda #40
adc pixel_ptr
sta pixel_ptr
lda #0
adc pixel_ptr + 1
sta pixel_ptr + 1
jmp draw_pixel
done:
rts
.endproc
.macro draw_text_indirect col, len, strptr
; clobbers A, X
.local loop
.local done
ldx #0
loop:
cpx #len
beq done
txa
tay
lda (strptr),y
tay
lda char_map,y
sta textbuffer + col,x
inx
jmp loop
done:
.endmacro
; in/out: column in text_col
; in: row in text_row
; in: pointer to string in INBUFF
; clobbers x/y/a/temp
.proc draw_string
drawptr = temp
strptr = INBUFF
.macro draw_text col, len, cstr
; clobbers A, X
.local loop
.local done
ldx #0
clc
lda #.lobyte(textbuffer)
adc text_col
sta temp
lda #.hibyte(textbuffer)
adc #0
sta temp + 1
ldx text_row
beq done_rows
continue_rows:
clc
lda temp
adc #40
sta temp
lda temp + 1
adc #0
sta temp + 1
dex
bne continue_rows
done_rows:
ldy #0
loop:
cpx #len
lda (strptr),y
; if char's null, terminate c-style
beq done
ldy cstr,x
lda char_map,y
sta textbuffer + col,x
inx
; save the char for terminator check
pha
; strip the high bit (terminator)
and #$7f
tax
lda char_map,x
sta (drawptr),y
iny
pla
; _last_ char has high bit set in atari rom routines
bmi done
jmp loop
done:
; move the text column pointer
tya
clc
adc text_col
sta text_col
rts
.endproc
.macro draw_string_const str
lda #.lobyte(str)
sta INBUFF
lda #.hibyte(str)
sta INBUFF + 1
jsr draw_string
.endmacro
.proc vblank_handler
@ -1367,7 +1538,7 @@ skip_char:
plus:
lda zoom
cmp #8
cmp #7
bpl skip_char
inc zoom
jmp done
@ -1378,16 +1549,20 @@ minus:
dec zoom
jmp done
up:
sub32 oy, oy, temp
add32 oy, oy, temp
jsr display_coords
jmp done
down:
add32 oy, oy, temp
sub32 oy, oy, temp
jsr display_coords
jmp done
left:
sub32 ox, ox, temp
jsr display_coords
jmp done
right:
add32 ox, ox, temp
jsr display_coords
jmp done
number_keys:
@ -1399,6 +1574,10 @@ number_keys:
beq three
cpy #KEY_4
beq four
cpy #KEY_5
beq five
cpy #KEY_6
beq six
jmp skip_char
one:
@ -1412,6 +1591,12 @@ three:
jmp load_key_viewport
four:
ldx #3
jmp load_key_viewport
five:
ldx #4
jmp load_key_viewport
six:
ldx #5
; fall through
load_key_viewport:
jsr load_viewport
@ -1447,12 +1632,63 @@ zero_byte_loop:
.proc status_bar
; Status bar
draw_text 0, str_self_len, str_self
draw_text 40 - str_run_len, str_run_len, str_run
lda #0
sta text_col
lda #0
sta text_row
draw_string_const str_self
lda #(40 - str_run_len)
sta text_col
draw_string_const str_run
rts
.endproc
.proc display_coords
lda #1
sta text_row
lda #col_x
sta text_col
draw_string_const str_x
copy32 FR0, ox
shift_round_16 FR0, 3
copy16 FR0, FR0 + 2
jsr fixed3_13_to_float
jsr FASC
jsr draw_string
lda #col_y
sta text_col
draw_string_const str_y
copy32 FR0, oy
shift_round_16 FR0, 3
copy16 FR0, FR0 + 2
jsr fixed3_13_to_float
jsr FASC
jsr draw_string
lda #col_zoom
sta text_col
draw_string_const str_zoom
lda zoom
clc
adc #0
sta FR0
lda #0
sta FR0 + 1
jsr IFP
jsr FASC
jsr draw_string
rts
.endproc
; input: viewport selector in x
; clobbers: a, x
.proc load_viewport
@ -1504,6 +1740,7 @@ zero_byte_loop:
sta DMACTL
jsr clear_screen
jsr display_coords
; Copy the display list into properly aligned memory
; Can't cross 1024-byte boundaries :D
@ -1542,19 +1779,24 @@ copy_byte_loop:
jsr SETVBV
main_loop:
; count_frames = 0; count_pixels = 0
; count_frames = 0; count_iters = 0
lda #0
sta count_frames
sta count_pixels
sta count_iters
sta count_iters + 1
; total_ms = 0.0; total_pixels = 0.0
ldx #total_ms
jsr ZF1
ldx #total_pixels
jsr ZF1
; total_sec = 0.0; total_iters = 0.0
jsr ZFR0
ldx #.lobyte(total_sec)
ldy #.hibyte(total_sec)
jsr FST0R
ldx #.lobyte(total_iters)
ldy #.hibyte(total_iters)
jsr FST0R
jsr clear_screen
jsr status_bar
jsr display_coords
lda #0
sta fill_level
@ -1612,6 +1854,7 @@ not_skipped_mask:
zoom_factor cx, sx, aspect_x
add32 cx, cx, ox
zoom_factor cy, sy, aspect_y
neg32 cy
add32 cy, cy, oy
jsr mandelbrot
jsr pset
@ -1623,38 +1866,32 @@ not_skipped_mask:
no_key:
; check if we should update the counters
;
; count_pixels >= width? update!
inc count_pixels
lda count_pixels
cmp #width
bmi update_status
; count_frames >= 120? update!
lda count_frames
cmp #120 ; >= 2 seconds
bmi skipped
bpl update_status
jmp skipped
update_status:
; FR0 = (float)count_pixels & clear count_pixels
lda count_pixels
sta FR0
lda #0
sta FR0 + 1
sta count_pixels
; FR0 = (float)count_iters & clear count_iters
copy16 FR0, count_iters
jsr IFP
lda #0
sta count_iters
sta count_iters + 1
; FR1 = total_pixels
ldx #.lobyte(total_pixels)
ldy #.hibyte(total_pixels)
; FR1 = total_iters
ldx #.lobyte(total_iters)
ldy #.hibyte(total_iters)
jsr FLD1R
; FR0 += FR1
jsr FADD
; total_pixels = FR0
ldx #.lobyte(total_pixels)
ldy #.hibyte(total_pixels)
; total_iters = FR0
ldx #.lobyte(total_iters)
ldy #.hibyte(total_iters)
jsr FST0R
@ -1667,44 +1904,66 @@ update_status:
sta count_frames
jsr IFP
; FR0 *= ms_per_frame
ldx #.lobyte(ms_per_frame)
ldy #.hibyte(ms_per_frame)
; FR0 *= sec_per_frame
ldx #.lobyte(sec_per_frame)
ldy #.hibyte(sec_per_frame)
jsr FLD1R
jsr FMUL
; FR0 += total_ms
ldx #total_ms
ldy #0
; FR0 += total_sec
ldx #.lobyte(total_sec)
ldy #.hibyte(total_sec)
jsr FLD1R
jsr FADD
; total_ms = FR0
ldx #total_ms
ldy #0
; total_sec = FR0
ldx #.lobyte(total_sec)
ldy #.hibyte(total_sec)
jsr FST0R
; FR0 /= total_pixels
ldx #total_pixels
ldy #0
; FR0 /= total_iters
ldx #.lobyte(total_iters)
ldy #.hibyte(total_iters)
jsr FLD1R
jsr FDIV
; convert to ASCII in INBUFF
; FR0 *= us_per_sec
ldx #.lobyte(us_per_sec)
ldy #.hibyte(us_per_sec)
jsr FLD1R
jsr FMUL
; round (down) to integer
jsr FPI
clc
jsr IFP
lda #speed_start
sta text_col
lda #0
sta text_row
draw_string_const str_speed
lda text_col
pha
draw_string_const str_padding
pla
sta text_col
; convert to ASCII in INBUFF and print
jsr FASC
; print the first 6 digits
draw_text_indirect speed_start, speed_precision, INBUFF
draw_text speed_start + speed_precision, str_speed_len, str_speed
jsr draw_string
skipped:
; sx += fill_level[fill_masks] + 1
ldx fill_level
lda fill_masks,x
clc
lda sx
adc #1
adc #1 ; will never carry
adc sx
sta sx
lda sx + 1
adc #0
lda #0
adc sx + 1
sta sx + 1
lda sx
@ -1714,12 +1973,15 @@ skipped:
loop_sx_done:
; sy += fill_level[fill_masks] + 1
ldx fill_level
lda fill_masks,x
clc
lda sy
adc #1
adc #1 ; will never carry
adc sy
sta sy
lda sy + 1
adc #0
lda #0
adc sy + 1
sta sy + 1
lda sy
@ -1738,7 +2000,13 @@ fill_loop_done:
loop:
; finished
draw_text 40 - str_done_len, str_done_len, str_done
lda #(40 - str_done_len)
sta text_col
lda #0
sta text_row
draw_string_const str_done
jsr keycheck
beq loop
jmp main_loop

View file

@ -18,7 +18,7 @@ Enjoy! I'll probably work on this off and on for the next few weeks until I've g
## Current state
Basic rendering is functional, with interactive zoom/pan (+/-/arrows) and 4 preset viewports via the number keys.
Basic rendering is functional, with interactive zoom/pan (+/-/arrows) and 6 preset viewports via the number keys.
The 16-bit signed integer multiplication takes two 16-bit inputs and emits one 32-bit output in the zero page, using the Atari OS ROM's floating point registers as workspaces. Inputs are clobbered.
@ -27,7 +27,7 @@ The 16-bit signed integer multiplication takes two 16-bit inputs and emits one 3
* when expanded RAM is available as on 130XE, a 64KB 8-bit multiplication table accelerates the remaining multiplications
* without expanded RAM, a table of half-squares is used to implement the algorithm from https://everything2.com/title/Fast+6502+multiplication
The mandelbrot calculations are done using 4.12-precision fixed point numbers with 8.24-precision intermediates. It may be possible to squish this down to 3.13/6.26.
The mandelbrot calculations are done using 3.13-precision fixed point numbers with 6.26-precision intermediates.
Iterations are capped at 255.

14
todo.md
View file

@ -1,19 +1,17 @@
things to try:
* skip add on the top-byte multiply in sqr8/mul8
* should save a few cycles, suggestion by jamey
* fix status bar to show elapsed time, per-iter time, per-pixel iter count
* 'turbo' mode disabling graphics in full or part
* patch the entire expanded-ram imul8xe on top of imul8 to avoid the 3-cycle thunk penalty :D
* try 3.13 fixed point instead of 4.12 for more precision
* can we get away without the extra bit?
* since exit compare space would be 6.26 i think so
* maybe clean up the load/layout of the big mul table
* consider alternate lookup tables in the top 16KB under ROM
* y-axis mirror optimization
* 'wide pixels' 2x and 4x for a fuller initial image in the tiered rendering
* maybe redo tiering to just 4x4, 2x2, 1x1?
* extract viewport for display & re-input via keyboard
* fujinet screenshot/viewport uploader