Package 'lazysf'

Title: Delayed Read for 'GDAL' Vector Data Sources
Description: Lazy read for drawings. A 'dplyr' back end for data sources supported by 'GDAL' vector drivers, that allows working with local or remote sources as if they are in-memory data frames. Basic features work with any drawing format ('GDAL vector data source') supported by the 'gdalraster' package.
Authors: Michael Sumner [aut, cre] (ORCID: <https://orcid.org/0000-0002-2471-7511>)
Maintainer: Michael Sumner <[email protected]>
License: GPL-3
Version: 0.3.0.9009
Built: 2026-05-10 08:55:34 UTC
Source: https://github.com/hypertidy/lazysf

Help Index


Force computation of a GDAL query

Description

Convert lazysf to an in memory data frame or sf object

Usage

collect

Arguments

x

output of lazysf()

...

passed to collect()

Format

An object of class function of length 1.

Details

collect() retrieves data into a local table, preserving grouping and ordering.

st_as_sf() retrieves data into a local sf data frame. Requires the sf package to be installed, and will succeed only if the result contains a geometry column (WKB or WKT via geom_format). The method is registered when sf is loaded.

Value

a data frame from collect(), sf data frame from st_as_sf() (only if it contains geometry)

See Also

lazysf

Examples

f <- system.file("extdata/multi.gpkg", package = "lazysf", mustWork = TRUE)
lsf <- lazysf(f)
dplyr::collect(lsf)

dbConnect

Description

dbConnect for vector data sources readable by GDAL

Usage

## S4 method for signature 'GDALVectorDriver'
dbConnect(
  drv,
  DSN = "",
  readonly = TRUE,
  geom_format = getOption("lazysf.geom_format", "WKB"),
  dialect = getOption("lazysf.dialect", "SQLITE"),
  ...
)

Arguments

drv

GDALVectorDriver created by GDALSQL()

DSN

data source name, may be a file, or folder path, database connection string, or URL

readonly

open in readonly mode (TRUE is the only option currently)

geom_format

geometry output format: "WKB" (default), "WKT", "NONE", or "BBOX" (alias "RCT"). Case-insensitive.

dialect

SQL dialect: "SQLITE" (default), "OGRSQL", "INDIRECT_SQLITE", or "" (let GDAL choose). SQLITE is recommended as it supports subqueries (required for dbplyr) and spatial SQL functions.

...

ignored

Details

The 'OGRSQL' available is documented with GDAL: https://gdal.org/en/stable/user/ogr_sql_sqlite_dialect.html

Examples

f <- system.file("extdata/multi.gpkg", package = "lazysf", mustWork = TRUE)
db <- dbConnect(GDALSQL(), f)
dbListTables(db)

GDALSQL

Description

GDALSQL driver, use with DBI::dbConnect() to open a data source readable by GDAL

Usage

GDALSQL()

See Also

lazysf dbConnect

Examples

GDALSQL()

Delayed (lazy) read for GDAL vector

Description

A lazy data frame for GDAL drawings ('vector data sources'). lazysf is DBI compatible and designed to work with dplyr. It should work with any data source (file, url, connection string) readable by GDAL via the gdalraster package.

Usage

lazysf(x, layer, ...)

## S3 method for class 'character'
lazysf(
  x,
  layer,
  ...,
  query = NA,
  geom_format = getOption("lazysf.geom_format", "WKB"),
  dialect = getOption("lazysf.dialect", "SQLITE")
)

## S3 method for class 'GDALVectorConnection'
lazysf(x, layer, ..., query = NA)

Arguments

x

the data source name (file path, url, or database connection string

  • analogous to a GDAL dsn) or a GDALVectorConnection

layer

layer name; defaults to the first layer

...

ignored

query

SQL query to pass in directly

geom_format

geometry output format, passed to DBI::dbConnect()

dialect

SQL dialect, passed to DBI::dbConnect()

Details

Lazy means that the usual behaviour of reading the entirety of a data source into memory is avoided. Printing the output results in a preview query being run and displayed (the top few rows of data).

The output of lazysf() is a 'tbl_GDALVectorConnection⁠that extends⁠tbl_dbi' and may be used with functions and workflows in the normal DBI way, see GDALSQL() for the lazysf DBI support.

The kind of query that may be run will depend on the type of format, see the list on the GDAL vector drivers page. For some details see the GDALSQL vignette.

When dplyr is attached the lazy data frame can be used with the usual verbs (filter, select, distinct, mutate, transmute, arrange, left_join, pull, collect etc.). To see the result as a SQL query rather than a data frame preview use dplyr::show_query().

To obtain an in memory data frame use an explicit collect(). If the sf package is installed, st_as_sf() will collect and convert to an sf data frame. A result may not contain a geometry column, in which case st_as_sf() will fail.

As well as collect() it's also possible to use tibble::as_tibble() or as.data.frame() or pull() which all force computation and retrieve the result.

Value

a 'tbl_GDALVectorConnection', extending 'tbl_lazy' (something that works with dplyr verbs, and only shows a preview until you commit the result via collect()) see Details

Examples

## a multi-layer file
f <- system.file("extdata/multi.gpkg", package = "lazysf", mustWork = TRUE)
lazysf(f)


## Geopackage (an actual database, so with SELECT we must be explicit re geom-column)
nc <- system.file("extdata/nc.gpkg", package = "lazysf", mustWork = TRUE)
lazysf(nc)
lazysf(nc, query = "SELECT AREA, FIPS, geom FROM nc WHERE AREA < 0.1")
lazysf(nc, layer = "nc") |> dplyr::select(AREA, FIPS, geom) |> dplyr::filter(AREA < 0.1)

## the famous ESRI Shapefile (not an actual database)
shdb <- system.file("extdata/nc.shp", package = "lazysf", mustWork = TRUE)
shp <- lazysf(shdb)
library(dplyr)
shp |>
 filter(NAME %LIKE% 'A%') |>
 mutate(abc = 1.3) |>
 select(abc, NAME, `_ogr_geometry_`) |>
 arrange(desc(NAME))