otbcli_TrainImagesClassifier misses polygons from training shapefile
Mantis Issue 812, reported by giulioceriola, assigned to jmichel, created: 2013-11-05
I am using the otbcli_TrainImagesClassifier on a single images (.tif) using as training set a shapefile consisting of many small circles grouped in 8 classes. Each circle has about the size of a single pixel of the image. The circle are generated from point samples by buffering.
-
I noticed that when the size of the circles is so that the circle itself doesn't cover completely the pixel, it is ignored. So when I perform the buffering, I assign a diameter slightly greater than the pixel diagonal. In this way each circle covers all the pixel located at its center. The circle also partially covers the surrounding pixels, but since they are not completely covered they are not taken into account. It seems that the sample.edg parameter doesn't help.
-
When I pass all the circles as a single shapefile to the otbcli_TrainImagesClassifier command, it misses some of them My polygon shapefile contains 143 circles, but the otbcli_TrainImagesClassifier "recognizes" only 110 of them. I tried to understand what are the missing circles and if they are linked to their position in respect to the image, but I didn't find a hint (e.g. if I rearrange the attribute table, the number recognized circles doesn't change, but the positions of the missing ones do). This happens both when the circles belonging to a same class are a multipart polygon or not.
BUT, if I split the polygon shapefile in 8 shapefiles, each one containing the circles belonging to a single class, and I pass all of them to the otbcli_TrainImagesClassifier, it recognizes all the circles correctly. However in this case I've to pass 8 times the input image.
Steps to reproduce:
Input data:
-
03_MeanShiftID03_ObjectRaster.tif (.tfw and _stats.xlm) Raster to be classified.
-
04_SampleObjPoints.shp (and ancillary files) Original point shapefile with the training points
-
05_SampleObjPoints_Buff_Diss.shp Multipart polygon shapefile, where to each point a buffer of 0.71 (half the pixel diagonal) has been applied and polygons belonging to the same class has been merged to a multipart polygon.
-
05_SingleSamples_x.shp (and ancillary files) Previous shapefile split in a single shapefile for each of the 8 class
-
05_SampleObjPoints_Buff_Diss_small.shp (and ancillary files) As bullet 3), with a buffer of 0.3. with this buffer ArcGIS and ERDAS Imagine correctly "acquire" the below pixel
-
05_SingleSamples_small_x.shp (and ancillary files) Shapefile at bullet 5) split following the same procedure of bullet 4).
Command lines, using OSGeo4W on Windows 7 and assuming all the files in the C:/temp3 folder:
This is the working command line:
otbcli_TrainImagesClassifier -io.il C:/temp3/06_RasterToClassify.tif C:/temp3/06_RasterToClassify.tif C:/temp3/06_RasterToClassify.tif C:/temp3/06_RasterToClassify.tif C:/temp3/06_RasterToClassify.tif C:/temp3/06_RasterToClassify.tif C:/temp3/06_RasterToClassify.tif C:/temp3/06_RasterToClassify.tif -io.vd C:/temp3/05_SingleSamples_1.shp C:/temp3/05_SingleSamples_2.shp C:/temp3/05_SingleSamples_3.shp C:/temp3/05_SingleSamples_4.shp C:/temp3/05_SingleSamples_5.shp C:/temp3/05_SingleSamples_6.shp C:/temp3/05_SingleSamples_7.shp C:/temp3/05_SingleSamples_8.shp -sample.vfn CLASS -sample.vtr 0.0 -io.imstat C:/temp3/06_RasterToClassify_stats.xml -classifier svm -io.out C:/temp3/07_model.svm -io.confmatout C:/temp3/07_Confusion.csv
It provides a training set of 143 pixel
The not working command lines:
otbcli_TrainImagesClassifier -io.il C:/temp3/06_RasterToClassify.tif -io.vd C:/temp3/05_SampleObjPoints_Buff_Diss.shp -sample.vfn CLASS -sample.vtr 0.0 -io.imstat C:/temp3/06_RasterToClassify_stats.xml -classifier svm -io.out C:/temp3/07_model.svm -io.confmatout C:/temp3/07_Confusion.csv -sample.edg true
It provides a training set of 102 pixel instead of 143.
otbcli_TrainImagesClassifier -io.il C:/temp3/06_RasterToClassify.tif -io.vd C:/temp3/05_SampleObjPoints_Buff_Diss_small.shp -sample.vfn CLASS -sample.vtr 0.0 -io.imstat C:/temp3/06_RasterToClassify_stats.xml -classifier svm -io.out C:/temp3/07_model.svm -io.confmatout C:/temp3/07_Confusion.csv -sample.edg true
otbcli_TrainImagesClassifier -io.il C:/temp3/06_RasterToClassify.tif C:/temp3/06_RasterToClassify.tif C:/temp3/06_RasterToClassify.tif C:/temp3/06_RasterToClassify.tif C:/temp3/06_RasterToClassify.tif C:/temp3/06_RasterToClassify.tif C:/temp3/06_RasterToClassify.tif C:/temp3/06_RasterToClassify.tif -io.vd C:/temp3/05_SingleSamples_small_1.shp C:/temp3/05_SingleSamples_small_2.shp C:/temp3/05_SingleSamples_small_3.shp C:/temp3/05_SingleSamples_small_4.shp C:/temp3/05_SingleSamples_small_5.shp C:/temp3/05_SingleSamples_small_6.shp C:/temp3/05_SingleSamples_small_7.shp C:/temp3/05_SingleSamples_small_8.shp -sample.vfn CLASS -sample.vtr 0.0 -io.imstat C:/temp3/06_RasterToClassify_stats.xml -classifier svm -io.out C:/temp3/07_model.svm -io.confmatout C:/temp3/07_Confusion.csv -sample.edg true
Both give error resulting in a void training set.
Additional information:
The raster to be classified is a "virtual raster", where each pixel represent a polygon obtained from segmentation of a multispectral image. The first band contains the DN of the corresponding polygon. The following bands represent a feature value (e.g. mean for each band, haralick textures, etc.). This raster has a "fake" geo-referentiation so to create a raster-space to be used as reference for the training shapefile.
The training shapefile has been obtained identifying visually (via QGis) training objects. For each object a point feature is created, covering the corresponding pixel in the above-raster space and a class value is assigned.
2015-11-10 15:00 - cpalmann: Could reproduce the issue