Skip to content
Snippets Groups Projects
  1. Nov 03, 2015
  2. Nov 02, 2015
  3. Oct 30, 2015
  4. Oct 29, 2015
  5. Oct 28, 2015
  6. Oct 27, 2015
  7. Oct 26, 2015
    • Julien Michel's avatar
      PERF: Avoid memory allocations in MeanShiftSmoothingImageFilter · 9f8f9494
      Julien Michel authored
      Patch kindly provided by Laurentiu Nicola <lnicola@c-s.ro>
      
      There are a couple of issues in the current implementation of
      MeanShiftSmoothingImageFilter.
      
      1. CalculateMeanShiftVector() uses
      MeanShift::FastImageRegionConstIterator, which is meant to avoid
      allocating VariableLengthVector instances on Get() calls. However, the
      current code does not use the GetPixelPointer() method, but Get().
      
      2. Actually, FastImageRegionConstIterator is not really needed in this
      case, because VariableLengthVector can reuse an external buffer instead
      of allocating one and the default Get() implementation uses that as an
      optimization. The problem with the current code is that the
      jointNeighbor variable was declared above the while loop and the result
      of Get() was assigned to it. On assignment, the external buffer cannot
      be reused, so a new one must be allocated. Changing the variable to be
      defined inside the loop allows the compiler to do a very good job at
      optimizing the code: the vector goes away completely.
      
      3. A similar change was made in ThreadedGenerateData(), with the
      difference that the vector there cannot be const. The
      VariableLengthVector assignment operator drops and reallocates the data
      buffer (although this was recently fixed in ITK, see
      http://itk.org/gitweb?p=ITK.git;a=commit;h=25393 ). We'll use component
      assignment instead.
      
      4. CalculateMeanShiftVector() was modified to compute the
      (per-component) reciprocal of its bandwidth argument to avoid performing
      the divisions in that loop. Precomputing it arguably reduces precision
      (as x/y != x*1/y when working in floating-point), but allows for better
      code to be generated.
      
      Tested by running
      
          otbcli_MeanShiftSmoothing -in qb_RoadExtract2.tif
                                    -fout MeanShift_FilterOutput.tif
                                    -modesearch 0
      
      Before applying the patch:
      
      162.99 user 0.46 system 0:21.68 elapsed 753% CPU
      
        29.40%  otbapp_MeanShiftSmoothing.so            [.] MeanShiftSmoothingImageFilter::CalculateMeanShiftVector
        26.17%  libc-2.17.so                            [.] malloc
        16.96%  libc-2.17.so                            [.] _int_free
        10.00%  libc-2.17.so                            [.] _int_malloc
         5.40%  otbapp_MeanShiftSmoothing.so            [.] VariableLengthVector::operator=
         1.89%  libc-2.17.so                            [.] free
         1.55%  libstdc++.so.6.0.19                     [.] operator new
         1.46%  libOTBApplicationEngine-5.0.so.1        [.] ImageRegionConstIterator::Increment
         1.27%  libOTBApplicationEngine-5.0.so.1        [.] VariableLengthVector::AllocateElements
      
      After the patch:
      
      45.63 user 0.49 system 0:07.22 elapsed 638% CPU
      
         76.17%  otbapp_MeanShiftSmoothing.so           [.] MeanShiftSmoothingImageFilter::CalculateMeanShiftVector
         7.12%  libOTBApplicationEngine-5.0.so.1        [.] ImageRegionConstIterator::Increment
         2.53%  libc-2.17.so                            [.] malloc
         1.80%  libc-2.17.so                            [.] _int_free
         1.46%  ld-2.17.so                              [.] strcmp
         1.17%  libc-2.17.so                            [.] _int_malloc
         1.10%  ld-2.17.so                              [.] do_lookup_x
      
      Loading the application takes about two seconds from that time, but
      qb_RoadExtract2.tif seemed to be the largest image in the sample
      files.
      
      Most of the remaining memory allocations come from the return values of
      SpatialRangeJointDomainTransform. This could be dropped as it's simple,
      but it's not much of a bottleneck yet.
      
      Another issue is that the filter doesn't seem to use more than 2-3
      threads, but I don't know why it behaves like that.
      9f8f9494
    • Guillaume Pasero's avatar
      a214c494
    • Rashad Kanavath's avatar
      32da9ddc
    • Julien Michel's avatar
      dee35447
    • Guillaume Pasero's avatar
      dcfa1f89
    • Guillaume Pasero's avatar
  8. Oct 23, 2015
Loading