KMeans input centroids

changed milestone to %7.0.0

added 1 deleted label

changed title from Kmean input centroids to KMeans input centroids

changed the description

Looks good :)

Just a few points about parameter names for UX:

In the TrainVectorClassifier application, the following parameters have been added:

classifier.sharkkm.centroids : input centroid text file

classifier.sharkkm.centroidstats : a file containing stats to normalize the input centroids (non mandatory)

classifier.sharkkm.outcentroids : a text file containing the output centroids

In the KMeansClassification composite application, the following parameters have been added:

incentroids.in : input centroid text file

incentroids.normalize : flag for centroid normalization (the stats are already computed in the app for data normalization)

Can we make it so that "input centroid text file" has the same key in both applications? Something like: classifier.sharkkm.centroids and in.centroids?

Is the parameter incentroids.normalize really useful? Are there cases where it makes sense to have it false?

Can you add a test for the new parameters?

If the input centroids file is the output of another kmeans algorithm, the centroids will already be normalized. For images that are close (same sensor, same kind of scenes...) the output of the kmeans algorithm on one image can be a good starting point for kmeans algorithms on other images. But maybe nobody will do that in practice, I don't know ... Maybe this adds unnecessary complexity in the app. We could remove it from the KMeansClassification application and leave it in TrainVectorClassifier (normalize if there is a stat file).

yes i agree that would be simpler! let's do that?

added 138 commits

69f175a3...44005a23 - 129 commits from branch develop
944792ee - ENH: remove the centroid normalization option in KMeansClassification
9e1f8174 - DOC: rename centroid parameters
63376264 - DOC: rename parameters in TrainVectorClassifier
1df9f3f7 - COMP: move include to ignore warnings from shark
ae7bd659 - Merge branch 'develop' into kmean_centroids
3025bf9d - ENH: added a test for KMeans with input centroids
f82e3682 - ENH: added a warning if the number of input of centroid is not the same as the number of classes
44e00450 - DOC: more kmeans doc
a5a1c34a - STY: clang format

Compare with previous version

Done.

The new parameter names are centroids.in and centroids.out in KMeansClassification and classifier.sharkkm.centroids.in, classifier.sharkkm.centroids.stats and classifier.sharkkm.centroids.out in TrainVectorClassifier

I created a new test (+baselines) for KMeansClassification with input centroids. I don't think we need an additional test for TrainVectorClassifier, as this app is a part of the composite application KMeansClassification.

Did you compress the tif file with gdal?

hmm no ...

What command should I use ? gdal_translate -co "COMPRESS=lzw" src_dataset dst_dataset ?

Yup!

added 2 commits

91430a7f - ENH: compression baseline for the kmean with input centroid test
e89a91b2 - BUG: remove kmeans from regression algorithms

Compare with previous version

I also removed the Shark Kmeans algorithm from the available regression unsupervised classifier in LearningApplicationBase. It was available but the underlying MachineLearningModel does not support regression (before this fix calling TrainRegression -classifier sharkkm ... results in 2019-04-23 17:55:21 (FATAL) TrainRegression: itk::ERROR: Regression mode not implemented.)

added 1 commit

faa4a364 - ENH: update tests

Compare with previous version

Status: 1 failing test

added 63 commits

faa4a364...eb92becf - 61 commits from branch develop
e71da72b - ENH: update baselines
c5495bf9 - Merge branch 'develop' into kmean_centroids

Compare with previous version

In the Shark K means algorithm, to prevent the centroids from converging to the same points, if at the end of one iteration a centroid has no associated points in the input training data, it is reinitialized to a random point from the training data. This randomness makes the output of the algorithm platform dependent, and I don't know how we can control the random seed used (if it is possible).

With the input centroid file I used in the test, all the points of the training data set were associated with one of the five centroids, and so the centroids were reinitialized after this iteration ... Changing the baseline to something more coherent seems to correct the test.

resolved all discussions

added 1 commit

97d8b279 - DOC: code review

Compare with previous version

merged

mentioned in commit 98a40235

KMeans input centroids

Summary

Rationale

Implementation Details

Copyright

Merged by Cédric Traizet 5 years ago (Apr 29, 2019 7:30am UTC) 5 years ago

Activity

KMeans input centroids

Summary

Rationale

Implementation Details

Copyright

Merge request reports

Merged by Cédric Traizet 5 years ago (Apr 29, 2019 7:30am UTC) 5 years ago

Activity