Tuesday, April 21, 2015

Better flow accumulation with positive and negative weights

Most GIS users are probably familiar with calculating flow accumulation and many are probably also familiar with weighted flow accumulation. ArcMap, however, will not allow negative weights and this can be a severe limitation in some cases. Fortunately,  back in 2008 Bill Huber provided us with a simple yet elegant solution to this problem.  However, I've noticed that this problem tends to resurface from time to time on the forums. A while back I put together a geoprocessing model  to deal with this problem using Bill's solution which is available for download.

The map on the right illustrates how a tool, such as this one, might be used. I first performed the two operations that precede flow accumulation: filling sinks and calculating flow direction. In this case I used a 30 meter USGS NED DEM. Then using my new tool I used a weight raster that consisted of rain + snowmelt minus evapotranspiration. In areas with positive values water is being released into the environment. In contrast, negative values highlight areas where evaporation is  removing water from streams. In a typical flow accumulation operation flow accumulation only increases as one moves downstream. However, in arid environments we know that isn't true. Flows increase generally as one moves downstream due to multiple streams flowing into one another, but at some point flows begin to decrease because evaporation exceeds water inputs. The map above illustrates this in several places. Several streams flow out of the Ruby Mountains but many end when they flow into areas of high evaporation. Others flow off the map but  are diminished before they leave the map.

This example also demonstrates that streams could be treated as dynamic entities dependent upon both precipitation and temperature. The data for the map above was derived using the Climatic Water Deficit Toolbox for the month of June. Later in the summer months flows may be diminished and connectivity for fish species may be impacted. In contrast, during the wetter months streams may become more connected. Finally, I should note that for the one instance where I compared a result like this to an actual hydrograph it didn't stack up so well. The streams went dry mid-summer in the flow accumulation model yet actual hydrograph data showed greatly diminished but still existent flows due inputs from groundwater. Nonetheless, these sorts of dynamic flow accumulation models are likely to be better than standard static flow accumulation models.

Wednesday, April 1, 2015

On why ArcMap gets quantiles wrong

Recently, I discovered that ArcMap gets quantiles wrong - really wrong.  I'm sure that there are many instances where ArcMap gets quantiles right, but recently it seemed like nearly every one that I've encountered had it wrong.  I think that it is important that every Esri user be aware of this limitation and the potential remedy.

When I started digging in deeper I discovered that ArcMap seems to get it wrong when there are highly skewed distributions, which leaves me wondering if Esri is aware that their quantiles algorithm just isn't cutting it for most scientific applications in which skewed data is common, if not the norm.

Below I show an example from a colleague of mine's.
The image to the right is the result of four Brownian Bridge movement models for four individual deer combined into one average. Purple indicates high values - areas where the biologists are nearly 100% that the deer traveled through. Yellow shows less probably paths.

When I did a quantiles classification after removing zero and isolated the top 10% of values I noticed that there were 2676 cells out of 474,135 total.  That is less than 1%, not even close to 10%.

The remedy for this situation is to take the raster data to vector.  You can use either points or polygons.  I tend to use polygons.  The workflow goes something like this:

1.  Remove zero values using SetNull
2.  Multiply by 10,000,000
3.  Convert to integer
4.  Convert from raster to polygon or point
5.  Use the sort tool to sort descending by the grid_code
6.  Calculate the cumulative values using the following python cod block in the field calculator:
total = 0
def cumsum(inc):
 global total
 return total
7.  Select the top 10%, 25%, 50%, 90%, etc.

The resulting map (left) illustrates the difference. Black areas show the total movement paths, red is the top 10% using the vector-based approach, and yellow (barely visible) is ArcMaps quantiles classification.  The new cell count is  47,413 which is the top 9.99% of cells.  This is pretty good in my opinion!

The lesson here is beware of quantiles when your data is highly skewed and always double check your work.  Thanks Marcus Blum for providing the data for this example.