Draft: Reduce memory footprint when ImageFileWriter can't stream its input pipeline to the output file
Closes #2310
This MR enables to reduce memory footprint to write output images into file formats that does not support streaming write.
Benchmarks
All benchmarks have been performed on a laptop with 32Gb RAM, SSD, Ubuntu 20.04 + a lot of tabs opened in web browser (that makes roughly 20Gb RAM available). We let OTB with the default OTB_MAX_RAM_HINT
which I believe is 256Mb.
Our goal is to perform the pansharpening of a Spot-7 (or Pléiades, or PNeo) image, in an output image format for which the GDAL driver only supports the GDALDriver::CreateCopy()
, which is intended to create a copy of an existing dataset (hence the whole dataset must already exist, either in-memory or in another raster file).
In the following, we use the GDAL Cloud Optimized Geotiff driver to write the output image.
We use the following command to perform the processing:
otbcli_BundleToPerfectSensor -inp $dim_pan -inxs $dim_xs -out "/data/pxs.tif" int16
We did use some Spot-7 image but of course we expect the same kind of behavior for Pléiades and PNeo.
Measurements
We have measured the processing time on the BundleToPerfectSensor
.
10k x 10k subset:
When everything is fine, and the memory budget is enough.
- BundleToPerfectSensor (cog): 52s
- BundleToPerfectSensor (cog, OTB_FORCE_STREAMING=1): 53s
20k x 20k subset:
When the image is big and the processing without streaming requires extra memory.
Original approach (trigger the entire pipeline, everything is in-memory)
Proposed approach (when OTB_FORCE_STREAMING is set to 1)
Comparison with original approach + gdal_translate
- BundleToPerfectSensor (cog,
OTB_FORCE_STREAMING=1
): 3m37s - BundleToPerfectSensor (gtiff) + gdal_translate (cog): 1m24s + 2m09s = 3m33s
Conclusion
PROS
- The MR enables to save memory when writing output image format for which the GDAL driver only supports the
GDALDriver::CreateCopy()
CONS
- Saving the output image in a streamable-capable raster format + using gdal_translate to achieve the final conversion is as fast, and use a ridiculous smaller memory footprint (because the whole output image is not stored in memory).
- Ultimately, there will always be an image size, for which the memory budget won't be enough, even with the
OTB_FORCE_STREAMING
approach. In this case, the user will fall back to the otb + gdal_translate approach.
Discussion
Maybe this MR should be generalized to python API in order to get the output as numpy array with OTB_FORCE_STREAMING
(for now, the entire pipeline is triggered over the largest possible image region).