Rémi Cresson requested to merge 2310-reduce_memory_footprint_nostreamwrite into develop Sep 29, 2022

Closes #2310

This MR enables to reduce memory footprint to write output images into file formats that does not support streaming write.

Benchmarks

All benchmarks have been performed on a laptop with 32Gb RAM, SSD, Ubuntu 20.04 + a lot of tabs opened in web browser (that makes roughly 20Gb RAM available). We let OTB with the default OTB_MAX_RAM_HINT which I believe is 256Mb. Our goal is to perform the pansharpening of a Spot-7 (or Pléiades, or PNeo) image, in an output image format for which the GDAL driver only supports the GDALDriver::CreateCopy(), which is intended to create a copy of an existing dataset (hence the whole dataset must already exist, either in-memory or in another raster file). In the following, we use the GDAL Cloud Optimized Geotiff driver to write the output image.

We use the following command to perform the processing:

otbcli_BundleToPerfectSensor -inp $dim_pan -inxs $dim_xs -out "/data/pxs.tif" int16

We did use some Spot-7 image but of course we expect the same kind of behavior for Pléiades and PNeo.

Measurements

We have measured the processing time on the BundleToPerfectSensor.

10k x 10k subset:

When everything is fine, and the memory budget is enough.

BundleToPerfectSensor (cog): 52s

BundleToPerfectSensor (cog, OTB_FORCE_STREAMING=1): 53s

20k x 20k subset:

When the image is big and the processing without streaming requires extra memory.

Original approach (trigger the entire pipeline, everything is in-memory)

Proposed approach (when OTB_FORCE_STREAMING is set to 1)

Comparison with original approach + gdal_translate

BundleToPerfectSensor (cog, OTB_FORCE_STREAMING=1): 3m37s
BundleToPerfectSensor (gtiff) + gdal_translate (cog): 1m24s + 2m09s = 3m33s

Conclusion

PROS

The MR enables to save memory when writing output image format for which the GDAL driver only supports the GDALDriver::CreateCopy()

CONS

Saving the output image in a streamable-capable raster format + using gdal_translate to achieve the final conversion is as fast, and use a ridiculous smaller memory footprint (because the whole output image is not stored in memory).
Ultimately, there will always be an image size, for which the memory budget won't be enough, even with the OTB_FORCE_STREAMING approach. In this case, the user will fall back to the otb + gdal_translate approach.

Discussion

Maybe this MR should be generalized to python API in order to get the output as numpy array with OTB_FORCE_STREAMING (for now, the entire pipeline is triggered over the largest possible image region).

Edited Oct 03, 2022 by Rémi Cresson

Draft: Reduce memory footprint when ImageFileWriter can't stream its input pipeline to the output file