Skip to content

Strip extended filenames or URLs with parameters, in pyotb.summarize() return

The idea is to enable modifying the paths of resources in pyotb.summarize(), to strip what's after ? character in the resources paths.

Rationale:

import planetary_computer

some_pc_asset = "/vsicurl/https://sentinel1euwest.blob.core.windows.net/s1-"
                "grd/GRD/2020/12/27/IW/DV/S1B_IW_GRDH_1SDV_20201227T060759_"
                "20201227T060824_024884_02F5F6_847A/measurement/iw-vv.tiff"

signed_asset = planetary_computer.sign_inplace(some_pc_asset)
# Now signed_asset is:
#  "/vsicurl/https://sentinel1euwest.blob.core.windows.net/s1-"
#  "grd/GRD/2020/12/27/IW/DV/S1B_IW_GRDH_1SDV_20201227T060759_"
#  "20201227T060824_024884_02F5F6_847A/measurement/iw-vv.tiff?"
#  "st=2023-05-20T20%3A14%3A07Z&se=2023-05-21T20%3A59%3A07Z&sp"
#  "=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0"
#  "f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=202"
#  "3-05-20T17%3A22%3A17Z&ske=2023-05-27T17%3A22%3A17Z&sks=b&s"
#  "kv=2021-06-08&sig=ww1ZfySpWebi3x6NJNJxdkqPBBHvPw%2B2qIqGp1"
#  "UeGX4%3D

app = pyotb.SomeApplication({"in": signed_asset, ...})

summary1 = pyotb.summarize(app)

Now, summary1 includes the whole planetary computer signed URL. This is not really interesting, since the URL has a short time to live and will expire soon. Moreover, it's bad if the purpose of the summary is to be reused since it would imply to remove manually the SAS token before signing the original URL again.

{
    "name": "SomeApplication",
    "parameters": {
        "in": "/vsicurl/https://sentinel1euwest.blob.core.windows.net/s1-grd/GRD/2020/12/27/IW/DV/S1B_IW_GRDH_1SDV_20201227T060759_20201227T060824_024884_02F5F6_847A/measurement/iw-vv.tiff?st=2023-05-20T20%3A14%3A07Z&se=2023-05-21T20%3A59%3A07Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-05-20T17%3A22%3A17Z&ske=2023-05-27T17%3A22%3A17Z&sks=b&skv=2021-06-08&sig=ww1ZfySpWebi3x6NJNJxdkqPBBHvPw%2B2qIqGp1UeGX4%3D",
        ...
    }

}

Proposed change

We could add an option to summarize() to strip URLs parameters, like this:

...
summary2 = pyotb.summarize(app, strip=True)

Which would result in:

{
    "name": "SomeApplication",
    "parameters": {
        "in": "/vsicurl/https://sentinel1euwest.blob.core.windows.net/s1-grd/GRD/2020/12/27/IW/DV/S1B_IW_GRDH_1SDV_20201227T060759_20201227T060824_024884_02F5F6_847A/measurement/iw-vv.tiff",
        ...
    }

}

It could also help to remove extended filenames when we forgot to remove them at applications input. Anyway, the strip option would be optional, so the user would do as pleased.

Edited by Rémi Cresson