Regression refactoring : TrainVectorRegression
Summary
This MR introduces a new application, TrainVectorRegression
for training a regression machine learning model from vector data, in the same fashion as TrainVectorClassifier
Rationale
See #1799 (closed)
This MR includes two major changes:
-
The application
TrainVectorBase
(the base class forTrainVectorClassifier
), is now template onTInputValue
(features) andTOutputValues
(class). Before the MR it only works onfloat
as feature type andint
as class type (classification case), but now it can also be used for other type likefloat,float
(regression case). -
A new application
TrainVectorRegression
deriving fromTrainVectorBase
Tests
A test has been added for the new application, using a rf classifier as regression algorithm. In the end all regression algorithm should be tested, but I think we can do that in the (future) TrainImagesRegression, to keep the same testing strategy as for classification.
Additional notes
This is not exactly the workflow described in the issue, because I don't think the first step is relevent (removing sampling from TrainRegrssion
), as TrainRegression
will be deprecated at the end of the refactoring.
The next step of the refactoring is to create a TrainImagesRegression
application: it could be a composite application that chains ImageEnvelope
to create a polygon on the extent of the image, SampleSelection
to select random points over this polygon, SampleExtraction
to extract feature and predictor values over two input images and finally TrainVectorRegression
to extract the model (this is the workflow used in the KMeansClassification
composite application), what do you think ? Anyway I think this is out of the scope of this MR.
In the issue we talked about CSV input compatibility. It is hard to add it in TrainVectorBase
because of the design of the application, the best way (given the design of the learning applications) would probably to create a new application TrainCSVBase
inheriting from LearningApplicationBase
doing the CSV input reading, and then create a TrainCSVRegression
from it, and maybe also a TrainCSVClassifier
. But it there really a need for such functionality ?
Copyright
The copyright owner is CNES and has signed the ORFEO ToolBox Contributor License Agreement.
Check before merging:
- All discussions are resolved
- At least 2
votes from core developers, no vote. - The feature branch is (reasonably) up-to-date with the base branch
- Dashboard is green
- Copyright owner has signed the ORFEO ToolBox Contributor License Agreement
- Optionally, run
git diff develop... -U0 --no-color | clang-format-diff.py -p1 -i
on latest changes and commit
Merge request reports
Activity
changed milestone to %7.0.0
added app refactoring labels
- Automatically resolved by Cédric Traizet
added 54 commits
-
8e06c83c...0df44b31 - 51 commits from branch
develop
- b948c575 - Merge branch 'develop' into regression_refactoring
- 3e071bca - ENH: use mse instead of model as text baseline for regression
- 5b5c03d6 - BUG: add additional mse baseline for OpenCV 3
Toggle commit list-
8e06c83c...0df44b31 - 51 commits from branch
- Resolved by Cédric Traizet
- Automatically resolved by Cédric Traizet
- Automatically resolved by Cédric Traizet
added CNES backlog label
- Automatically resolved by Cédric Traizet
added 14 commits
-
1d3baef0...cb2f6d1b - 10 commits from branch
develop
- b0e27774 - Merge branch 'develop' into regression_refactoring
- c90877f5 - COMP: remove SetDocName
- 12ec94ae - STY: run clang format
- 5e79e868 - COMP: resolve merge conflict
Toggle commit list-
1d3baef0...cb2f6d1b - 10 commits from branch
mentioned in commit 78d7f56c
So how you see it, the application takes as input a shapefile containing the vector support, a predictor image and a list of feature images ? or a shapefile containing the vector support and the predictor as features, and a list of feature images ?
There are several possibility regarding the IOs of this application, and we should agree on which one to use before starting the implementation.
The idea behind the use of
ImageEnveloppe
was to provide an application that works with feature and predictor images as input (and no vector data), which is howkmeansClassification
works (this app could be called TrainVectorUnsupervised if we had more than one unsupervised algorithm), and is also the API of the currentTrainRegression
application.I think predictor and feature shall remain images, but allowing to restrict the pixels used for regression using a vector data instead of forcing the use of the image enveloppe would be great.
I can imagine some use cases that would be very hard to implement without this, for instance if you want to train the regression algorithm only on a certain category of vegetation.
mentioned in merge request !496 (merged)
mentioned in issue #1682 (closed)