Skip to content

ENH: optimize input reading in `SampleAugmentation`

Laurențiu Nicola requested to merge sample-augmentation-reading into develop

SampleAugmentation's SMOTE implementation is a bit inefficient, but while I was looking into the memory usage, I noticed it was slow even when using only 6700 rows out of 11M.

There are roughly two changes here:

  • using SetAttributeFilter instead of iterating through every feature to filter by the selected class
  • doing fewer field type checks

In my case, which is a merged VRT (which is not ideal for GDAL), the app takes about:

  • 2125 seconds, originally
  • 173 seconds, using SetAttributeFilter
  • 168 seconds, with both changes
Edited by Laurențiu Nicola

Merge request reports