Skip to content

Sentinel2 masks for no-data

It seems that missing masks (CLM, CLP, dataMask) that can be found on Terrascope OpenEO SENTINEL2_L2A collection but not on CDSE SENTINEL2_L2A collection are in fact produced by synergise using s2cloudless. They are not part of the standard sen2cor L2A product. Hence, it is unlikely that CDSE add them in the future.

In order to assess the impact of using only the bare SCL mask from CDSE, I did 2 runs on the code that generated the LS2S2 dataset.

First is the reference code that uses SCL, CLM, CLP and dataMask:

    # Build no-data mask from Scene classification layer
    no_data_mask = s2_arr.SCL.astype(np.uint8).isin([0, 1, 2, 3, 7, 8, 9, 10])
    no_data_mask = np.logical_or(no_data_mask, s2_arr["dataMask"] == 0)
    no_data_mask = np.logical_or(no_data_mask, s2_arr["CLM"] > 0)
    no_data_mask = np.logical_or(no_data_mask, s2_arr["CLP"] > 150)

    mask_stack = []
    # Perform mask dilation since sen2corr mask are very tight
    for t in s2_arr.t:
        current_mask = no_data_mask.sel(t=t).values
        mask_stack.append(
            mask_processing(current_mask, min_object_size=10, dilation=25)
        )

    mask_stack = np.stack(mask_stack)

    nan_mask = np.isnan(s2_arr["B02"].values)
    for b in (
        "B03",
        "B04",
        "B08",
        "B05",
        "B06",
        "B07",
        "B8A",
        "B11",
        "B12",
    ):
        nan_mask = np.logical_or(nan_mask, np.isnan(s2_arr[b].values))

    # Introduce nan mask here since we do not want to dilate nan mask
    s2_arr["no_data"] = ("t", "y", "x"), np.logical_or(nan_mask, mask_stack)

32TPT_12_sentinel2_synopsis_ref

Second is the same code, but removes all logic related to CLM, CLP and dataMask:

    # Build no-data mask from Scene classification layer
    no_data_mask = s2_arr.SCL.astype(np.uint8).isin([0, 1, 2, 3, 7, 8, 9, 10])

    mask_stack = []
    # Perform mask dilation since sen2corr mask are very tight
    for t in s2_arr.t:
        current_mask = no_data_mask.sel(t=t).values
        mask_stack.append(
            mask_processing(current_mask, min_object_size=10, dilation=25)
        )

    mask_stack = np.stack(mask_stack)

    nan_mask = np.isnan(s2_arr["B02"].values)
    for b in (
        "B03",
        "B04",
        "B08",
        "B05",
        "B06",
        "B07",
        "B8A",
        "B11",
        "B12",
    ):
        nan_mask = np.logical_or(nan_mask, np.isnan(s2_arr[b].values))

    # Introduce nan mask here since we do not want to dilate nan mask
    s2_arr["no_data"] = ("t", "y", "x"), np.logical_or(nan_mask, mask_stack)

32TPT_12_sentinel2_synopsis_scl_only

As expected cloud masking is not as good with the bare SCL mask, especially for the small and on the edges of large clouds.

Is this good enough for RELEO ?