Skip to content
Snippets Groups Projects
Commit 9f8f9494 authored by Julien Michel's avatar Julien Michel
Browse files

PERF: Avoid memory allocations in MeanShiftSmoothingImageFilter

Patch kindly provided by Laurentiu Nicola <lnicola@c-s.ro>

There are a couple of issues in the current implementation of
MeanShiftSmoothingImageFilter.

1. CalculateMeanShiftVector() uses
MeanShift::FastImageRegionConstIterator, which is meant to avoid
allocating VariableLengthVector instances on Get() calls. However, the
current code does not use the GetPixelPointer() method, but Get().

2. Actually, FastImageRegionConstIterator is not really needed in this
case, because VariableLengthVector can reuse an external buffer instead
of allocating one and the default Get() implementation uses that as an
optimization. The problem with the current code is that the
jointNeighbor variable was declared above the while loop and the result
of Get() was assigned to it. On assignment, the external buffer cannot
be reused, so a new one must be allocated. Changing the variable to be
defined inside the loop allows the compiler to do a very good job at
optimizing the code: the vector goes away completely.

3. A similar change was made in ThreadedGenerateData(), with the
difference that the vector there cannot be const. The
VariableLengthVector assignment operator drops and reallocates the data
buffer (although this was recently fixed in ITK, see
http://itk.org/gitweb?p=ITK.git;a=commit;h=25393 ). We'll use component
assignment instead.

4. CalculateMeanShiftVector() was modified to compute the
(per-component) reciprocal of its bandwidth argument to avoid performing
the divisions in that loop. Precomputing it arguably reduces precision
(as x/y != x*1/y when working in floating-point), but allows for better
code to be generated.

Tested by running

    otbcli_MeanShiftSmoothing -in qb_RoadExtract2.tif
                              -fout MeanShift_FilterOutput.tif
                              -modesearch 0

Before applying the patch:

162.99 user 0.46 system 0:21.68 elapsed 753% CPU

  29.40%  otbapp_MeanShiftSmoothing.so            [.] MeanShiftSmoothingImageFilter::CalculateMeanShiftVector
  26.17%  libc-2.17.so                            [.] malloc
  16.96%  libc-2.17.so                            [.] _int_free
  10.00%  libc-2.17.so                            [.] _int_malloc
   5.40%  otbapp_MeanShiftSmoothing.so            [.] VariableLengthVector::operator=
   1.89%  libc-2.17.so                            [.] free
   1.55%  libstdc++.so.6.0.19                     [.] operator new
   1.46%  libOTBApplicationEngine-5.0.so.1        [.] ImageRegionConstIterator::Increment
   1.27%  libOTBApplicationEngine-5.0.so.1        [.] VariableLengthVector::AllocateElements

After the patch:

45.63 user 0.49 system 0:07.22 elapsed 638% CPU

   76.17%  otbapp_MeanShiftSmoothing.so           [.] MeanShiftSmoothingImageFilter::CalculateMeanShiftVector
   7.12%  libOTBApplicationEngine-5.0.so.1        [.] ImageRegionConstIterator::Increment
   2.53%  libc-2.17.so                            [.] malloc
   1.80%  libc-2.17.so                            [.] _int_free
   1.46%  ld-2.17.so                              [.] strcmp
   1.17%  libc-2.17.so                            [.] _int_malloc
   1.10%  ld-2.17.so                              [.] do_lookup_x

Loading the application takes about two seconds from that time, but
qb_RoadExtract2.tif seemed to be the largest image in the sample
files.

Most of the remaining memory allocations come from the return values of
SpatialRangeJointDomainTransform. This could be dropped as it's simple,
but it's not much of a bottleneck yet.

Another issue is that the filter doesn't seem to use more than 2-3
threads, but I don't know why it behaves like that.
parent a214c494
Branches
Tags
No related merge requests found
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment