ENH: optimize input reading in `SampleAugmentation`
SampleAugmentation
's SMOTE implementation is a bit inefficient, but while I was looking into the memory usage, I noticed it was slow even when using only 6700 rows out of 11M.
There are roughly two changes here:
- using
SetAttributeFilter
instead of iterating through every feature to filter by the selected class - doing fewer field type checks
In my case, which is a merged VRT (which is not ideal for GDAL), the app takes about:
- 2125 seconds, originally
- 173 seconds, using
SetAttributeFilter
- 168 seconds, with both changes
Edited by Laurențiu Nicola