PERF: Optimize compare image
Speed-up the performances on
On a test case, a rough x3 speed-up has been observed.
Comparing the images produced against their baseline is not as efficient as it could be at the end of automated tests.
The improvements have been done on two axes
Hidden costs of vector images
DifferenceImageFilter is often used on
VectorImages, in particular when doing
VariableLengthVector pixels at each iteration, or casting them will automatically end up in construction and destruction of VLV pixels, and thus allocation and liberation of heap memory. This is not efficient.
The first refactoring aims at minimizing the number of allocation by factorizing out everything that can be (pixel size, typical max value, typical zero value). Resetting value is done by assignment, which has the good property of not inducing any allocation on VLV variables.
The reduce computation done in
DifferenceImageFilter was done on some
m_ThreadAccumulatedCounter[threadId] member. This has two drawbacks:
- first the compiler cannot know whether we are the only one function to work on that value, it cannot isolate it in a register nor anything similar
- moreover false sharing happens and prevent further optimizations.
As a consequence the counters are now local variables in the
ThreadedGenerateData() function, and they are commited into the shared member variables at the end of the execution.
Note: this anti-pattern appears in other places like for instance
StreamingCompareImageFilter. This performance related refactoring should be extended to these other cases.
The copyright owner is CNES and has signed the ORFEO ToolBox Contributor License Agreement.
Check before merging:
- All discussions are resolved
- At least 2
👍votes from core developers, no 👎vote.
- The feature branch is (reasonably) up-to-date with the base branch
- Dashboard is green
- Copyright owner has signed the ORFEO ToolBox Contributor License Agreement