apply jamey's suggestion of skipping add for high byte muls
rather than saving 0 into the high bytes, then adding the high-byte multiplication later, write it directly in place. this saves a few cycles on every iteration, and it adds up nicely. View 1 overview render times: 130XE: 10.050 ms/px - 4m56s 800XL: 10.906 ms/px - 5m21s
This commit is contained in:
parent
d157fe1306
commit
582ddf497f
2 changed files with 4 additions and 28 deletions
3
todo.md
3
todo.md
|
|
@ -1,8 +1,5 @@
|
|||
things to try:
|
||||
|
||||
* skip add on the top-byte multiply in sqr8/mul8
|
||||
* should save a few cycles, suggestion by jamey
|
||||
|
||||
* patch the entire expanded-ram imul8xe on top of imul8 to avoid the 3-cycle thunk penalty :D
|
||||
|
||||
* y-axis mirror optimization
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue