Zonalstatistics
Summary
Add a new application for zonal statistics.
Rationale
There is currently no such application in the Orfeo ToolBox. This application enables to compute statistics (min, max, mean, standard deviation) of objects in a vector image.
Implementation Details
Input objects can be described from a vector data, or from a label image:
- mode vector: the
VectorData
is rasterized at the same origin/spacing/size as the input image. Polygons are numbered starting from 0. The statistics are computed using aStreamingStatisticsMapFromLabelImageFilter
. - mode label image: the statistics are directly computed using a
StreamingStatisticsMapFromLabelImageFilter
, from the input image and the label image.
The application enables to produce output stats in multiple forms:
- as XML (Using a XMLFileWriter as the ComputeImagesStatistics application do),
- VectorData (stats are written in features of each polygons),
- or Raster (stats are stored in bands of the output image)
Classes and files
- A new application in
AppClassification
:ZonalStatistics
(otbZonalStatistics.cxx) - 5 tests were added in
AppClassification/test
Applications
New: ZonalStatistics
application (otbZonalStatistics.cxx)
Tests
- Input vector / Output vector
- Input raster / Output XML (without nodata-value for the label image)
- Input raster / Output XML (with nodata-value for the label image)
- Input vector / Output raster
- Input raster / Output raster
I've pushed a new branch in otb-data called "zonalstatistics" which contains new files in baseline and input:
- Baseline/apTvClVectorData_QB1_ter_with_stats.sqlite
- Input/Classification/VectorData_QB1_ter.tif
- Baseline/apTvClVectorData_QB1_ter_stats.xml
- Baseline/apTvClVectorData_QB1_ter_stats_nodata.xml
Additional notes
Please feel free to discuss this feature.
Copyright
The copyright owner is IRSTEA and has signed the ORFEO ToolBox Contributor License Agreement.
***Check before merging:*** - All discussions are resolved - At least 2


Merge request reports
Activity
Hi,
This is closely related to what I was discussing here https://groups.google.com/forum/#!topic/otb-users/i9AS_Y6FOUs. Therefore, I can throw away the code I was writing. Thank you ;)
I will do a thorough review of what you have done (use the application with some real data). After a quick inspection, it seems great. I am only missing the possibility of producing an image with the stats values for every pixel. This would be very useful for classification purposes (the vector mode is useful for the training step). Can't this be done by rasterizing the vector output inside the application? Since we don't have in memory connection between applications for vector outputs, this would allow not writing the vector output if not needed.
Another issue is that the statistics filter used is limited in terms of what can be computed. Things like median or other rank stats would be useful. But they can't easily be computed in the streaming case (a polygon split into different streams would need to be recomputed as a whole and not only updated with partial stats).
Edited by Jordi Inglada- Automatically resolved by Rémi Cresson
@jinglada yes you are right: since the approach is to compute the statistics in streaming, there is many metrics that cannot (at least not easily!) be computed. For the median and many other, we should have a different approach (object after object), maybe it could be great to have both in the future: one simple, light and fast application for basic statistics (like this one) and one heavy-duty application to perform a more complete statistics.
In
ZonalStatistics
we can add new "streamable" statistics computation in theStreamingStatisticsMapFromLabelImageFilter
class, it's quite simple.Regarding the output statistics image, you propose to write an image with the statistics values for each polygons?
@remicress About streaming: in the ZonalStatistics that I have started to code, I was going to do as the streaming filter you use, but the polygons which are not completely inside a stream region are dealt with afterwards. My idea (not coded yet) was to store their id into a list and, after the synthetise step, read them by batches (read a number of geometries up to a ram limit estimated from the image resolution and the number of bands) and then process in parallel inside every batch (read the pixels, compute the stats).
If for each statistic that we compute we can flag it as "streamable" or not, the filter should be able to choose the computation mode so this is transparent for the user.
About the output stats image: yes this would be a vector image where every pixel belonging to one of the input polygons has the value of the different stats in each of its bands. It would be a rasterization of the vector output that you have implemented, but the advantage would be that this raster output can be connected in memory as the input of another application (an image classifier, for instance). Using the otb::OGRDataSourceToLabelImageFilter like in the Rasterization application should do the trick. But maybe the problem can be solved by writing the vector output and then using the Rasterization application to continue the pipeline, but doing the rasterization in the application avoids writing the vector file.
@remicress By the way, my comments/suggestions above, if pertinent, could be implemented later on without delaying this merge request.
mentioned in issue #1714 (closed)
added 1 commit
- 6f27a726 - REFAC: factorize the intNoData use in vector and raster zone definition modes
It might be nice to leverage the new no data support in the statistics filter (!223 (merged)), if that's not too much of a hassle.
added 26 commits
-
6f27a726...31c5dbb5 - 23 commits from branch
develop
- e904ab71 - COMP: update test. TODO: fix .dbf comparison failing test
- 7e9340f6 - WIP: add background value for stats computation
- a1dbdc59 - Merge branch 'develop' of gitlab.orfeo-toolbox.org:orfeotoolbox/otb into zonalstatistics
Toggle commit list-
6f27a726...31c5dbb5 - 23 commits from branch
marked as a Work In Progress from 7e9340f6
added 1 commit
- d3bf419b - FIX: use inbv for background value for all input modes
added 1 commit
- 3898b5e6 - ENH: output vector data supported for input zone description of type label image
added 1 commit
- 17c2b87b - COMP: first test compares .sqlite file instead of .dbf of shapefile
- Automatically resolved by Rémi Cresson