Skip to content

KMeans input centroids

Cédric Traizet requested to merge kmean_centroids into develop

Summary

Add the option to provide user defined centroids as initialization of the kmean algorithm in KMeansClassification and TrainVectorClassifier.

See feature request #1820 (closed)

Rationale

The result of the KMeans algorithm depends on the input centroids, but it is currently not possible to set them (the k first points of the training sample are used as initialization`. This MR adds the possibility to provide the centroids in a text file.

In the TrainVectorClassifier application, the following parameters have been added:

  • classifier.sharkkm.centroids : input centroid text file
  • classifier.sharkkm.centroidstats : a file containing stats to normalize the input centroids (non mandatory)
  • classifier.sharkkm.outcentroids : a text file containing the output centroids

In the KMeansClassification composite application, the following parameters have been added:

  • incentroids.in : input centroid text file
  • incentroids.normalize : flag for centroid normalization (the stats are already computed in the app for data normalization)

Implementation Details

Input centroid file reading is done using the Shark API (importCSV).

In SharkKMeansMachineLearningModel, the normalization option has been removed. Normalization was possible during training (Train()), using the Shark API to train a normalizer on the input list sample. This option was not used anywhere in OTB, and I removed it because the normalizer cannot be used afterward during classification... Instead the data normalization should be done prior to the training (as it is done in the applications).

In SharkKMeansMachineLearningModel, I added a method to export the centroids as a text file (using the Shark's exportCSV method), this can be used to obtain a human readable version of the centroid (the serialized model file can be hard to read). The centroids can now be exported in the TrainVectorClassifier application, and the KMeansClassification uses this method instead of creating the output centroid file from the serialized file (this was not working anyway, the output centroids where wrong...)

This means that the output centroids text file from a kmean application can be used as input of another kmean application !

Copyright

The copyright owner is CNES and has signed the ORFEO ToolBox Contributor License Agreement.


Check before merging:

  • All discussions are resolved
  • At least 2 👍 votes from core developers, no 👎 vote.
  • The feature branch is (reasonably) up-to-date with the base branch
  • Dashboard is green
  • Copyright owner has signed the ORFEO ToolBox Contributor License Agreement
Edited by Cédric Traizet

Merge request reports