apply jamey's suggestion of skipping add for high byte muls

rather than saving 0 into the high bytes, then adding the high-byte multiplication later, write it directly in place. this saves a few cycles on every iteration, and it adds up nicely. View 1 overview render times: 130XE: 10.050 ms/px - 4m56s 800XL: 10.906 ms/px - 5m21s
2025-01-04 10:53:51 -08:00 · 2025-01-04 10:53:51 -08:00 · 582ddf497f
commit 582ddf497f
parent d157fe1306
2 changed files with 4 additions and 28 deletions
--- a/todo.md
+++ b/todo.md
@ -1,8 +1,5 @@
 things to try:

-* skip add on the top-byte multiply in sqr8/mul8
-  * should save a few cycles, suggestion by jamey
-
 * patch the entire expanded-ram imul8xe on top of imul8 to avoid the 3-cycle thunk penalty :D

 * y-axis mirror optimization