When I started digging in deeper I discovered that ArcMap seems to get it wrong when there are highly skewed distributions, which leaves me wondering if Esri is aware that their quantiles algorithm just isn't cutting it for most scientific applications in which skewed data is common, if not the norm.
The image to the right is the result of four Brownian Bridge movement models for four individual deer combined into one average. Purple indicates high values - areas where the biologists are nearly 100% that the deer traveled through. Yellow shows less probably paths.
When I did a quantiles classification after removing zero and isolated the top 10% of values I noticed that there were 2676 cells out of 474,135 total. That is less than 1%, not even close to 10%.
The remedy for this situation is to take the raster data to vector. You can use either points or polygons. I tend to use polygons. The workflow goes something like this:
1. Remove zero values using SetNull
2. Multiply by 10,000,000
3. Convert to integer
4. Convert from raster to polygon or point
5. Use the sort tool to sort descending by the grid_code
6. Calculate the cumulative values using the following python cod block in the field calculator:
total = 0
def cumsum(inc):
global total
total+=inc
return total
7. Select the top 10%, 25%, 50%, 90%, etc.
The resulting map (left) illustrates the difference. Black areas show the total movement paths, red is the top 10% using the vector-based approach, and yellow (barely visible) is ArcMaps quantiles classification. The new cell count is 47,413 which is the top 9.99% of cells. This is pretty good in my opinion!
The lesson here is beware of quantiles when your data is highly skewed and always double check your work. Thanks Marcus Blum for providing the data for this example.
Ran into the same problem. Monumental waste of time. I managed to deal with it in R using raster package.
ReplyDeletez<-raster(testrast)
r<-quantile(z,probs = c(0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1))
Hi Michelle,
ReplyDeleteThanks for providing the R solution. Much cleaner with only two lines of code. Did you find my solution to be a waste of time or was it the fact that quantiles in ArcGIS provided the wrong answer that led to the monumental waster of time?