--- title: "Using abularaster" author: "Michael D. Sumner" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Using tabularaster} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # tabularaster The `raster` package is extremely powerful in the R ecosystem for spatial data. It can be used very efficiently to drive data extraction and summary tools using its consistent cell-index and comprehensive helper functions for converting between cell values and less abstract raster grid properties. Tabularaster provides some more helpers for working with cells and tries to fill some of the (very few!) gaps in raster functionality. When raster returns cell values of hierarchical objects it returns a hierarchical (list) of cells to match the input query. Tabularaster provides: - `cellnumbers()`: extraction of cells as a simple data frame with "object ID" and "cell index" - `as_tibble()`: for raster data, with options for value column and cell, dimension and date indexing - `decimate()`: fast resolution-reduction without care - `index_extent()`: build an index extent, basically a raster Extent in row/column form The raster function `extentFromCells()` started life in tabularaster. There were some sf-features in early versions of tabularaster, but the raster package now supports that format (despite there being absolutely zero community development between the two worlds). ## Examples Extract the cell numbers of raster `r` that are co-located with object `q`. (The argument names are `x` and `query`). ```{r, eval=FALSE} cellnumbers(r, q) ``` In the above example, `r` is any *raster* object and `q` is another spatial object, used as a query. Cell numbers can be extracted from any raster object, any of a `raster::raster`, `raster::stack` or `raster::brick`. It's not really relevant what that object contains, as only the *dimensions* (number of cells in x and y) and the *extent* (geographic range in x and y) determine the result. The `r` object can actually not contain any data - this is a very powerful but seemingly under-used feature of the `raster` package. The object `q` may be any of `sf`, `sp` layer types or a matrix of raw coordinates (x-y). Tabularaster follows the basic principles of tidy data and *hypertidy data* and aspires to keep the software design out of your way and simply help to get the task done. # Simple examples In straightforward usage, `cellnumbers` returns a tibble with `object_` to identify the spatial object by number, and `cell_` which is specific to the raster object, a function of its `extent`, `dim`ensions and `projection` (crs - coordinate reference system). ```{r} library(raster) library(tabularaster) (r <- raster(volcano)) (cell <- cellnumbers(r, cbind(0.5, 0.5))) ``` This cell number query can be then be used to drive other raster functions, like `extract` and `xyFromCell` and many others. ```{r} xyFromCell(r, cell$cell_) raster::extract(r, cell$cell_) ``` This is an extremely efficient way to drive extractions from raster objects, for performing the same query from multiple layers at different times. It's also very useful for using `dplyr` to derive summaries, rather than juggling lists of extracted values, or different parts of raster objects. ## as tibble There is an `as_tibble` method with options for cell, dimension, and date. ```{r} library(dplyr) as_tibble(r) b <- brick(r, r*2) as_tibble(b) as_tibble(b, cell = FALSE) %>% arrange(desc(dimindex)) ## leave out the cell index ``` The date or date-time is used as the dimension index if present. ```{r} btime <- raster::setZ(b, Sys.time() + c(1, 10)) as_tibble(btime) %>% group_by(dimindex) %>% summarize(n = n()) as_tibble(btime, split_date = TRUE) ``` # Warnings 1. I tend to end up using `tidyr::extract` and `raster::extract`, `dplyr::select` and `raster::select` as I always use these packages together. 2. `cellnumbers` doesn't currently reproject the second argument `query`, even when would make sense to do so like `extract` does. This is purely to reduce the required dependencies. 3. There's no formal link between the cell number values and the raster object itself. I use this "loose coupling" so extensively that I have developed habits that tend to make it pretty robust. Please use with caution, you can easily get incorrect answers by asking a different raster a question based on the wrong cell numbers. If you find that things don't work, first check if it's a namespace problem, there are a few function name overlaps in the `tidyverse` and `raster`, and in R generally. There is no way to fix this properly atm. Tabularaster doesn't reproject on the fly, but might use the reproj package in future. Ultimately the cell index vector should probably be a formal class, with knowledge of its extent and grain. I'd love this to be formalized, but I seem to not have the design expertise required to get the system right. It's something that `ggplot2` needs, but there aren't any existing examples in R anywhere as far as I can tell. The [stars project](https://github.com/r-spatial/stars) is a good place to see what else is happening in this space in R. Other examples are the unfinshed `tbl_cube` in `dplyr`, the R6 objects in `velox`, and the mesh indexing used by packages `rgl`, `Vcg`, `icosa`, `dggridR`, `deldir`, `geometry`, `RTriangle`, `TBA`, (and there are many others). If you are interested in these issues please get in touch, use the [Issues tab](https://github.com/hypertidy/tabularaster/issues) or [discuss at r-spatial](https://github.com/r-spatial/discuss), get on [twitter #rstats](https://twitter.com/hashtag/rstats) or contact me directly. # Applied example This example uses extracted data per polygon and uses base R to lapply across the list of values extracted per polygon. Here we show a more dplyr-ish version after extracting the cell numbers with tabularaster. ```{r} library(tabularaster) ## https://gis.stackexchange.com/questions/102870/step-by-step-how-do-i-extract-raster-values-from-polygon-overlay-with-q-gis-or library(raster) # Create integer class raster r <- raster(ncol=36, nrow=18) r[] <- round(runif(ncell(r),1,10),digits=0) # Create two polygons cds1 <- rbind(c(-180,-20), c(-160,5), c(-60, 0), c(-160,-60), c(-180,-20)) cds2 <- rbind(c(80,0), c(100,60), c(120,0), c(120,-55), c(80,0)) polys <- SpatialPolygonsDataFrame(SpatialPolygons(list(Polygons(list(Polygon(cds1)), 1), Polygons(list(Polygon(cds2)), 2))),data.frame(ID=c(1,2))) ## do extraction in abstract terms (cn <- cellnumbers(r, polys)) library(dplyr) ## now perform extraction for real ## and pipe into grouping by polygon (object_) and value, and ## calculate class percentage from class counts per polygon cn %>% mutate(v = raster::extract(r, cell_)) %>% group_by(object_, v) %>% summarize(count = n()) %>% mutate(v.pct = count / sum(count)) ## here is the traditional code used in the stackoverflow example # Extract raster values to polygons #( v <- extract(r, polys) ) # Get class counts for each polygon #v.counts <- lapply(v,table) # Calculate class percentages for each polygon #( v.pct <- lapply(v.counts, FUN=function(x){ x / sum(x) } ) ) ``` # Extract cell numbers ```{r} library(tabularaster) data("ghrsst") ## a RasterLayer data("sst_regions") ## a polygon layer, contiguous with ghrsst gcells <- cellnumbers(ghrsst, sst_regions) %>% mutate(object_ = as.integer(object_)) result <- gcells %>% mutate(sst = raster::extract(ghrsst, cell_)) %>% group_by(object_) %>% summarize_at(vars(sst), funs(mean(., na.rm = TRUE), sd(., na.rm = TRUE), length)) ``` # Extract cells from rasters ```{r} library(tabularaster) library(raster) library(dplyr) data("rastercano") data("polycano") cells <- cellnumbers(rastercano, polycano[4:5, ]) cellnumbers(rastercano, as(polycano[4:5, ], "SpatialLinesDataFrame")) cellnumbers(rastercano, as(as(polycano[4:5, ], "SpatialLinesDataFrame"), "SpatialPointsDataFrame")) ```