Skip to content
Snippets Groups Projects

Regression refactoring : TrainVectorRegression

Merged Cédric Traizet requested to merge regression_refactoring into develop
All threads resolved!

Summary

This MR introduces a new application, TrainVectorRegression for training a regression machine learning model from vector data, in the same fashion as TrainVectorClassifier

Rationale

See #1799 (closed)

This MR includes two major changes:

  • The application TrainVectorBase (the base class for TrainVectorClassifier), is now template on TInputValue (features) and TOutputValues (class). Before the MR it only works on float as feature type and int as class type (classification case), but now it can also be used for other type like float,float (regression case).

  • A new application TrainVectorRegression deriving from TrainVectorBase

Tests

A test has been added for the new application, using a rf classifier as regression algorithm. In the end all regression algorithm should be tested, but I think we can do that in the (future) TrainImagesRegression, to keep the same testing strategy as for classification.

Additional notes

This is not exactly the workflow described in the issue, because I don't think the first step is relevent (removing sampling from TrainRegrssion), as TrainRegression will be deprecated at the end of the refactoring.

The next step of the refactoring is to create a TrainImagesRegression application: it could be a composite application that chains ImageEnvelope to create a polygon on the extent of the image, SampleSelection to select random points over this polygon, SampleExtraction to extract feature and predictor values over two input images and finally TrainVectorRegression to extract the model (this is the workflow used in the KMeansClassification composite application), what do you think ? Anyway I think this is out of the scope of this MR.

In the issue we talked about CSV input compatibility. It is hard to add it in TrainVectorBase because of the design of the application, the best way (given the design of the learning applications) would probably to create a new application TrainCSVBase inheriting from LearningApplicationBase doing the CSV input reading, and then create a TrainCSVRegression from it, and maybe also a TrainCSVClassifier. But it there really a need for such functionality ?

Copyright

The copyright owner is CNES and has signed the ORFEO ToolBox Contributor License Agreement.


Check before merging:

  • All discussions are resolved
  • At least 2 :thumbsup: votes from core developers, no :thumbsdown: vote.
  • The feature branch is (reasonably) up-to-date with the base branch
  • Dashboard is green
  • Copyright owner has signed the ORFEO ToolBox Contributor License Agreement
  • Optionally, run git diff develop... -U0 --no-color | clang-format-diff.py -p1 -i on latest changes and commit
Edited by Cédric Traizet

Merge request reports

Pipeline #1385 passed

Pipeline passed for 5e79e868 on regression_refactoring

Merged by Cédric TraizetCédric Traizet 5 years ago (May 9, 2019 9:18am UTC)

Merge details

Pipeline #1390 passed

Pipeline passed for 78d7f56c on develop

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Cédric Traizet added 54 commits

    added 54 commits

    • 8e06c83c...0df44b31 - 51 commits from branch develop
    • b948c575 - Merge branch 'develop' into regression_refactoring
    • 3e071bca - ENH: use mse instead of model as text baseline for regression
    • 5b5c03d6 - BUG: add additional mse baseline for OpenCV 3

    Compare with previous version

  • Luc Hermitte
  • Luc Hermitte
  • Luc Hermitte
  • Cédric Traizet added 2 commits

    added 2 commits

    • 40e15141 - ENH: replace GetfieldAs... by GetValue<..>
    • 2fabfa17 - ENH: use reference instead of pointer in mse computation

    Compare with previous version

  • Cédric Traizet changed title from Regression refactoring to Regression refactoring : TrainVectorRegression

    changed title from Regression refactoring to Regression refactoring : TrainVectorRegression

  • Cédric Traizet changed the description

    changed the description

  • Status: need review and vote

    Also need opinions on the the workflow for the next step of the refactoring (composite application for trainImagesRegression, and no CSV application, see the additional notes of the Merge Request)

  • Cédric Traizet added 1 deleted label

    added 1 deleted label

  • Victor Poughon added 1 commit

    added 1 commit

    • 1d3baef0 - DOC: typo in TrainVectorRegression

    Compare with previous version

  • pushed a minor typo fix

  • Victor Poughon
  • Cédric Traizet added 14 commits

    added 14 commits

    Compare with previous version

  • Cédric Traizet resolved all discussions

    resolved all discussions

  • Cédric Traizet mentioned in commit 78d7f56c

    mentioned in commit 78d7f56c

  • Just one comment about TrainImageRegression: I think it would be a limitation of usage to generate the vector support from sampling though ImageEnveloppe : one might want to perform regression only for a given class or subset of the image.

  • So how you see it, the application takes as input a shapefile containing the vector support, a predictor image and a list of feature images ? or a shapefile containing the vector support and the predictor as features, and a list of feature images ?

    There are several possibility regarding the IOs of this application, and we should agree on which one to use before starting the implementation.

    The idea behind the use of ImageEnveloppe was to provide an application that works with feature and predictor images as input (and no vector data), which is how kmeansClassification works (this app could be called TrainVectorUnsupervised if we had more than one unsupervised algorithm), and is also the API of the current TrainRegression application.

  • I think predictor and feature shall remain images, but allowing to restrict the pixels used for regression using a vector data instead of forcing the use of the image enveloppe would be great.

    I can imagine some use cases that would be very hard to implement without this, for instance if you want to train the regression algorithm only on a certain category of vegetation.

  • Cédric Traizet mentioned in merge request !496 (merged)

    mentioned in merge request !496 (merged)

  • mentioned in issue #1682 (closed)

  • Please register or sign in to reply
    Loading