| Title: | Delayed Read for 'GDAL' Vector Data Sources |
|---|---|
| Description: | Lazy read for drawings. A 'dplyr' back end for data sources supported by 'GDAL' vector drivers, that allows working with local or remote sources as if they are in-memory data frames. Basic features work with any drawing format ('GDAL vector data source') supported by the 'gdalraster' package. |
| Authors: | Michael Sumner [aut, cre] (ORCID: <https://orcid.org/0000-0002-2471-7511>) |
| Maintainer: | Michael Sumner <[email protected]> |
| License: | GPL-3 |
| Version: | 0.3.0.9009 |
| Built: | 2026-05-10 08:55:34 UTC |
| Source: | https://github.com/hypertidy/lazysf |
Convert lazysf to an in memory data frame or sf object
collectcollect
x |
output of |
... |
passed to |
An object of class function of length 1.
collect() retrieves data into a local table, preserving grouping and ordering.
st_as_sf() retrieves data into a local sf data frame. Requires the sf
package to be installed, and will succeed only if the result contains a
geometry column (WKB or WKT via geom_format). The method is registered
when sf is loaded.
a data frame from collect(), sf data frame from st_as_sf()
(only if it contains geometry)
lazysf
f <- system.file("extdata/multi.gpkg", package = "lazysf", mustWork = TRUE) lsf <- lazysf(f) dplyr::collect(lsf)f <- system.file("extdata/multi.gpkg", package = "lazysf", mustWork = TRUE) lsf <- lazysf(f) dplyr::collect(lsf)
dbConnect for vector data sources readable by GDAL
## S4 method for signature 'GDALVectorDriver' dbConnect( drv, DSN = "", readonly = TRUE, geom_format = getOption("lazysf.geom_format", "WKB"), dialect = getOption("lazysf.dialect", "SQLITE"), ... )## S4 method for signature 'GDALVectorDriver' dbConnect( drv, DSN = "", readonly = TRUE, geom_format = getOption("lazysf.geom_format", "WKB"), dialect = getOption("lazysf.dialect", "SQLITE"), ... )
drv |
GDALVectorDriver created by |
DSN |
data source name, may be a file, or folder path, database connection string, or URL |
readonly |
open in readonly mode ( |
geom_format |
geometry output format: |
dialect |
SQL dialect: |
... |
ignored |
The 'OGRSQL' available is documented with GDAL: https://gdal.org/en/stable/user/ogr_sql_sqlite_dialect.html
f <- system.file("extdata/multi.gpkg", package = "lazysf", mustWork = TRUE) db <- dbConnect(GDALSQL(), f) dbListTables(db)f <- system.file("extdata/multi.gpkg", package = "lazysf", mustWork = TRUE) db <- dbConnect(GDALSQL(), f) dbListTables(db)
GDALSQL driver, use with DBI::dbConnect() to open a data source readable by GDAL
GDALSQL()GDALSQL()
lazysf dbConnect
GDALSQL()GDALSQL()
A lazy data frame for GDAL drawings ('vector data sources'). lazysf is DBI compatible and designed to work with dplyr. It should work with any data source (file, url, connection string) readable by GDAL via the gdalraster package.
lazysf(x, layer, ...) ## S3 method for class 'character' lazysf( x, layer, ..., query = NA, geom_format = getOption("lazysf.geom_format", "WKB"), dialect = getOption("lazysf.dialect", "SQLITE") ) ## S3 method for class 'GDALVectorConnection' lazysf(x, layer, ..., query = NA)lazysf(x, layer, ...) ## S3 method for class 'character' lazysf( x, layer, ..., query = NA, geom_format = getOption("lazysf.geom_format", "WKB"), dialect = getOption("lazysf.dialect", "SQLITE") ) ## S3 method for class 'GDALVectorConnection' lazysf(x, layer, ..., query = NA)
x |
the data source name (file path, url, or database connection string
|
layer |
layer name; defaults to the first layer |
... |
ignored |
query |
SQL query to pass in directly |
geom_format |
geometry output format, passed to |
dialect |
SQL dialect, passed to |
Lazy means that the usual behaviour of reading the entirety of a data source into memory is avoided. Printing the output results in a preview query being run and displayed (the top few rows of data).
The output of lazysf() is a 'tbl_GDALVectorConnectionthat extendstbl_dbi' and
may be used with functions and workflows in the normal DBI way, see GDALSQL() for
the lazysf DBI support.
The kind of query that may be run will depend on the type of format, see the list on the GDAL vector drivers page. For some details see the GDALSQL vignette.
When dplyr is attached the lazy data frame can be used with the usual
verbs (filter, select, distinct, mutate, transmute, arrange, left_join, pull,
collect etc.). To see the result as a SQL query rather than a data frame
preview use dplyr::show_query().
To obtain an in memory data frame use an explicit collect().
If the sf package is installed, st_as_sf() will collect and convert to an
sf data frame. A result may not contain a geometry column, in which case
st_as_sf() will fail.
As well as collect() it's also possible to use tibble::as_tibble() or
as.data.frame() or pull() which all force computation and retrieve the
result.
a 'tbl_GDALVectorConnection', extending 'tbl_lazy' (something that works
with dplyr verbs, and only shows a preview until you commit the result via
collect()) see Details
## a multi-layer file f <- system.file("extdata/multi.gpkg", package = "lazysf", mustWork = TRUE) lazysf(f) ## Geopackage (an actual database, so with SELECT we must be explicit re geom-column) nc <- system.file("extdata/nc.gpkg", package = "lazysf", mustWork = TRUE) lazysf(nc) lazysf(nc, query = "SELECT AREA, FIPS, geom FROM nc WHERE AREA < 0.1") lazysf(nc, layer = "nc") |> dplyr::select(AREA, FIPS, geom) |> dplyr::filter(AREA < 0.1) ## the famous ESRI Shapefile (not an actual database) shdb <- system.file("extdata/nc.shp", package = "lazysf", mustWork = TRUE) shp <- lazysf(shdb) library(dplyr) shp |> filter(NAME %LIKE% 'A%') |> mutate(abc = 1.3) |> select(abc, NAME, `_ogr_geometry_`) |> arrange(desc(NAME))## a multi-layer file f <- system.file("extdata/multi.gpkg", package = "lazysf", mustWork = TRUE) lazysf(f) ## Geopackage (an actual database, so with SELECT we must be explicit re geom-column) nc <- system.file("extdata/nc.gpkg", package = "lazysf", mustWork = TRUE) lazysf(nc) lazysf(nc, query = "SELECT AREA, FIPS, geom FROM nc WHERE AREA < 0.1") lazysf(nc, layer = "nc") |> dplyr::select(AREA, FIPS, geom) |> dplyr::filter(AREA < 0.1) ## the famous ESRI Shapefile (not an actual database) shdb <- system.file("extdata/nc.shp", package = "lazysf", mustWork = TRUE) shp <- lazysf(shdb) library(dplyr) shp |> filter(NAME %LIKE% 'A%') |> mutate(abc = 1.3) |> select(abc, NAME, `_ogr_geometry_`) |> arrange(desc(NAME))