The plot below shows the render times for one sample per pixel averaged over 10 runs, measured from a build compiled with
-O3 -mfpmath=sse -march=native -flto
. Measurements were taken using std::chrono::high_resolution_clock
and only include time to render, ie. time to load the scene and write the images to disk is ignored. Now that the renderer
has a bit more work to do we can start to see thread contention hurting the render time when running on significantly
more threads than the number of hardware threads.
Desktop
CPU: Intel i5-2500K @ 4.0GHz, 4 hardware threads
RAM: 8GB 1600MHz DDR3
Compiler: gcc 4.8.0 (MinGW on Windows)
Chart made using C3.js
num_threads
blocks, hand them off and relax. This new method
chops the image up into a specified number of blocks, shuffles them and then hands them off to the threads as they render.
This does a bit better job of distributing the workload over the threads and is also more fun to watch. Below is a recording
of the rendering slowed down significantly by inserting some short sleeps into the worker threads.