An array of 13G was loaded and manipulated in memory during SOD, SMOD et NSP calculation, this led to a memory explosion. Chunk processing enables better memory management (with rasterio we can read an image per window).
With a chunk of 1000x1000, we keep the time performance and the max memory is now 15 Gb during the SOD, SMOD et NSP calculation.
This test has also a very long calculation time, between 45 and 50 minutes. After the gapfilling, calculating the binary mask for each date takes around 30 minutes. Initially, a BandMathX is used. Using gdal_calc.py instead greatly improves calculation time.
gdal_calc.py has an execution time proportional to the number of bands (1.2s per band for 5490x5490 tiles) but for unknown reasons, in the case of the muscate test where the synthesis is carried out over a year (366 bands), the calculation takes an extremely long time. However, if we carry out the calculation only on intervals of a smaller number of bands (100 for example), we get acceptable times (around 8 minutes)
For an analysis with a temporal step of 100 bands and a chunk of 1000x1000, the muscate test takes 24 minutes instead of 47 minutes. Here is the memory use :
Test the binary mask creation using BandmathX with OTB 8 to see if there is some performance improvement. Also, OTB should optimize memory usage, is there something we are doing wrong?
If OTB algorithm cannot have better performance or memory footprint, just try a plain rasterio implementation with some windowing.
Configure the window according to the file format on disk, either reading band per band, or if bands are stored as tiled, some 3D window.
Memory usage : no improvement with OTB 8, but better performances with OTB 9 (7G for max memory instead of 17G).
Execution time : no major differences between OTB 7, OTB 8 and OTB 9 (always between 25 and 30min). But I realized I only used 1 CPU. I did some tests with 4 CPU and we have better results : 14min for OTB 7 and OTB 8, 8min for OTB 9 (it takes only 5 min if we simplify the equation for the bandmathx).
So finally, keeping the BandMathX seems interesting if we can use many CPU.
Moreover, I will verify that we correctly use the BandMathX function.
My point of view here is to prefer the simpler code instead of better performance. LIS is fast enough in any case. If memory and performances are enhanced in the next OTB versions, I would say it's better to keep the code as it is.
It's fine by me. I will keep BandMathX and simplify the equation (1 operation instead of 2 is a better use), check the latest results and then open the merge request.
After a closer look, I noticed that the number of threads given in the config file was not used in the synthesis script. I've added it (based on what was done for FSC snow detection).
OTB calculations are now correctly parallelized according to the config file and calculation node parameters. However, the latest calculations are not (SOD, SMOD, NSP...). It could be interesting in the future to parallelize this part to make efficient use of all the chosen CPUs. With eoscale for example.