pretty pictures, slowly

2021/11/23

pretty pictures, slowly

I've hit a milestone with my implementation of the Ray Tracer Challenge, I can render the cover image of the book!

This was my primary goal when I first started this book, so I am pretty stoked to have made it here. However, I am having major performance issues. The above image took 17 minutes to render, which is not great. It is significantly slowed down by using a ray bounce depth of 5, so every ray that hits a reflective surface has the potential to 5x the computation, which adds up. I've applied various optimizations already like caching inverse matrices, reducing allocations by mutation, and rewriting the core tuple and matrix code to be more efficient. These all provided a significant speed up, but it is not quite enough. I've experimented with multithreading, and saw a ~2x speed up, but I want to keep this single threaded.

Here's what the profiler output while generating the above image (using snakeviz for visualization). python profile graph screenshot

As we can see, there is nothing unexpected. We spend the vast majority of the time multiplying matrices together. I've focused on this code, and can't squeeze much more performance out of it. I experimented a bit with numpy and numba here, but didn't see much improvement (and don't want to use them in this project). The problem seems to be that we are wasting a lot of work. I've read ahead about using a Bounded Volume Hierarchy to only check for intersections on objects within a bounding box, and I look forward to the potential speedup. As it is now, the above image required 172,535,400 calls to the matrix_mul_tuple function to be rendered in 300px by 300px. The book notes about an order of magnitude reduction in the amount of intersection calls when using BVH on an appropriate scene, which I hope to see as well.

I plan to implement BVH then crunch through the rest of the book pretty quickly, given that I've done this all before. I'll explore further optimizations at the end, and then look into doing another implementation with performance in mind. I've been particularly motivated by a site I found on the book's forum that demos a C++ implementation compiled to WASM. If I can get on the same order of magnitude of performance, I'll be happy.

I've been enjoying the process lately, which is great! While this ray tracer is not the most performant (as admitted by the author), it does produce some of the better images that I have seen when compared to other hobbyist/educational ray tracers. I am happy with the image quality now, but I want to focus on performance before doing more post-book additions like depth-of-field and anti-aliasing. For now, here's two scenes from the recent chapters: $reflections and refractions image$ table cube example image