From 3a785cbff1e9e48aae1206d40465a889f516a3a3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?C=C3=A9dric=20Traizet?= <cedric.traizet@c-s.fr> Date: Wed, 4 Sep 2019 14:53:14 +0200 Subject: [PATCH] DOC: update the classification recipe to use the new regression applications --- .../Cookbook/rst/recipes/pbclassif.rst | 174 +++++------------- 1 file changed, 48 insertions(+), 126 deletions(-) diff --git a/Documentation/Cookbook/rst/recipes/pbclassif.rst b/Documentation/Cookbook/rst/recipes/pbclassif.rst index 4202066085..5ff4d77643 100644 --- a/Documentation/Cookbook/rst/recipes/pbclassif.rst +++ b/Documentation/Cookbook/rst/recipes/pbclassif.rst @@ -853,17 +853,6 @@ element is equal to 3 pixels, which corresponds to a ball included in a 7 x 7 pixels square. Pixels with more than one majority class keep their original labels. - - -Regression ----------- - -The machine learning models in OpenCV and LibSVM also support a -regression mode: they can be used to predict a numeric value (i.e. not -a class index) from an input predictor. The workflow is the same as -classification. First, the regression model is trained, then it can be -used to predict output values. The applications to do that are and . - .. |image_61| image:: ../Art/MonteverdiImages/classification_chain_inputimage.jpg .. |image_62| image:: ../Art/MonteverdiImages/classification_chain_fancyclassif_CMR_input.png .. |image_63| image:: ../Art/MonteverdiImages/classification_chain_fancyclassif_CMR_3.png @@ -877,144 +866,77 @@ used to predict output values. The applications to do that are and . Figure 6: From left to right: Original image, fancy colored classified image and regularized classification map with radius equal to 3 pixels. -The input data set for training must have the following structure: - -- *n* components for the input predictors - -- one component for the corresponding output value - -The application supports 2 input formats: - -- An image list: each image should have components matching the - structure detailed earlier (*n* feature components + 1 output value) - -- A CSV file: the first *n* columns are the feature components and the - last one is the output value - -If you have separate images for predictors and output values, you can -use the application. - -:: - otbcli_ConcatenateImages -il features.tif output_value.tif - -out training_set.tif +Regression +---------- -Statistics estimation -~~~~~~~~~~~~~~~~~~~~~ +The machine learning models in OpenCV, LibSVM and SharkML also support a +regression mode: they can be used to predict a numeric value (i.e. not +a class index) from an input predictor. The workflow is the same as +classification. First, the regression model is trained, then it can be +used to predict output values. -As in classification, a statistics estimation step can be performed -before training. It allows to normalize the dynamic of the input -predictors to a standard one: zero mean, unit standard deviation. The -main difference with the classification case is that with regression, -the dynamic of output values can also be reduced. +Two applications are available for training: -The statistics file format is identical to the output file from -application, for instance: +- `TrainVectorRegression` can be used to train a classifier with a set of geometries + containing a list of features (predictors) and the corresponding output value: -:: + :: - <?xml version="1.0" ?> - <FeatureStatistics> - <Statistic name="mean"> - <StatisticVector value="198.796" /> - <StatisticVector value="283.117" /> - <StatisticVector value="169.878" /> - <StatisticVector value="376.514" /> - </Statistic> - <Statistic name="stddev"> - <StatisticVector value="22.6234" /> - <StatisticVector value="41.4086" /> - <StatisticVector value="40.6766" /> - <StatisticVector value="110.956" /> - </Statistic> - </FeatureStatistics> - -In the application, normalization of input predictors and output values -is optional. There are 3 options: - -- No statistic file: normalization disabled - -- Statistic file with *n* components: normalization enabled for input - predictors only - -- Statistic file with *n+1* components: normalization enabled for - input predictors and output values - -If you use an image list as training set, you can run application. It -will produce a statistics file suitable for input and output -normalization (third option). + otbcli_TrainVectorRegression -io.vd samples.sqlite + -cfield predicted + -io.out model.rf + -classifier rf + -feat perimeter area width -:: + The validation set `io.valid` is used to compute the mean square error between the original output value and the value + predicted by the computed model. If no validation set is provided the input training sample is used to compute the mean square error. - otbcli_ComputeImagesStatistics -il training_set.tif - -out stats.xml -Training -~~~~~~~~ -Initially, the machine learning models in OTB only used classification. -But since they come from external libraries (OpenCV and LibSVM), the -regression mode was already implemented in these external libraries. So -the integration of these models in OTB has been improved in order to -allow the usage of regression mode. As a consequence , the machine -learning models have nearly the same set of parameters for -classification and regression mode. +- `TrainImagesRegression` can be used to train a classifier from multiple pairs of predictor images and label images. + There is two ways to use this application: -.. |image11| image:: ../Art/MonteverdiImages/classification_chain_inputimage.jpg -.. |image12| image:: ../Art/MonteverdiImages/QB_1_ortho_MV_C123456_CM.png -.. |image13| image:: ../Art/MonteverdiImages/classification_chain_inputimage.jpg -.. |image14| image:: ../Art/MonteverdiImages/QB_1_ortho_DS_V_P_C123456_CM.png + It is possible to provide for each input image a vector data file with geometries + corresponding to the input locations that will be used for training. This is achieved by using the `io.vd` parameter. + The `sample.nt` and `sample.nv` can be used to specify the number of sample extracted from the images, for training and + validation, respectively. -.. |image15| image:: ../Art/MonteverdiImages/classification_chain_inputimage.jpg - :scale: 88% + :: -- Decision Trees + otbcli_TrainImagesRegression -io.il inputPredictorImage.tif + -io.ip inputLabelImage.tif + -io.vd trainingData.shp + -classifier rf + -io.out model.txt + -sample.nt 1000 + -sample.nv 500 -- Gradient Boosted Trees -- Neural Network -- Random Forests + Alternatively, if no input vector data is provided, the training samples will be + extracted from the full image extent. -- K-Nearest Neighbors +Two applications are also available for predictions: -The behavior of application is very similar to . From the input data -set, a portion of the samples is used for training, whereas the other -part is used for validation. The user may also set the model to train -and its parameters. Once the training is done, the model is stored in an -output file. +- `VectorRegression` uses a regression machine learning model to predict output values based on a + list of features: :: - otbcli_TrainRegression -io.il training_set.tif - -io.imstat stats.xml - -io.out model.txt - -sample.vtr 0.5 - -classifier knn - -classifier.knn.k 5 - -classifier.knn.rule median - -Prediction -~~~~~~~~~~ - -Once the model is trained, it can be used in application to perform the -prediction on an entire image containing input predictors (i.e. an image -with only *n* feature components). If the model was trained with -normalization, the same statistic file must be used for prediction. The -behavior of with respect to statistic file is identical to: - -- no statistic file: normalization off - -- *n* components: input only - -- *n+1* components: input and output + otbcli_VectorRegression + -in input_vector_data.shp + -feat perimeter area width + -model model.txt + -out predicted_vector_data.shp -The model to use is read from file (the one produced during training). +- Similarly, `ImageRegression` takes an image of predictors as input and computes the output image using a regression model: :: - otbcli_PredictRegression -in features_bis.tif - -model model.txt - -imstat stats.xml - -out prediction.tif + otbcli_ImageRegression + -in input_image.tif + -model model.txt + -out predicted_image.tif + -- GitLab