Implement masked autoencoding training for ScalarSitsPerceiver
- Decide whether this needs a new
LightningModule
or whether it can be implemented as an option of the current one - List masking strategies to implement and define their parameters
- Temporal: drop dates in SITS
- Spectral: drop spectral bands (either the same for all dates or different bands for different dates)
- Spatial: drop pixels or subpatches (either the same for all dates or different regions for different dates)
- Modality: drop data sources (again, all dates or only some of them)
- Drop tokens
- A combination of the above
- Decide when to perform the masking:
- At the dataset level: generate a new cached dataset containing both the masked and unmasked versions of each sample. This does not seem easy to implement for spatial masking.
- During tokenization (do not generate the tokens that should be masked)
- After tokenization (remove the masked tokens). This one seems the easiest if we do it before positional embedding, since the subtokens (date, band, location, etc) contain the filtering criteria.
- Implement the masking
- Implement the masked reconstruction loss computation