| Title: | Zarr via Arrow |
|---|---|
| Description: | Low-level Zarr interface (V2 and V3) built on Arrow filesystem and codec infrastructure. Provides direct access to Zarr stores on local filesystems, S3, and GCS via Arrow, with fallback to GDAL virtual filesystems for HTTP and other protocols via gdalraster. Supports Kerchunk JSON and Parquet reference stores for virtual Zarr access to existing archives. Automatically detects Zarr version and handles S3 region redirects. |
| Authors: | Michael Sumner [aut, cre] (ORCID: <https://orcid.org/0000-0002-2471-7511>), Hugo Gruson [ctb] (ORCID: <https://orcid.org/0000-0002-4094-1476>) |
| Maintainer: | Michael Sumner <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.0.1.9010 |
| Built: | 2026-05-29 09:54:15 UTC |
| Source: | https://github.com/hypertidy/zaro |
Connects to a Zarr store at the given path or URI. Dispatches to the appropriate backend based on the URI scheme:
Local paths: Arrow LocalFileSystem
s3://: Arrow S3FileSystem (with automatic region detection)
gs://: Arrow GcsFileSystem
http://, https://: gdalraster VSI (/vsicurl/)
reference+json://: Kerchunk JSON reference store
reference+parquet://: Kerchunk Parquet reference store
virtualizarr://: VirtualiZarr Parquet manifest store
zaro(source, verbose = TRUE, ..., validate = FALSE)zaro(source, verbose = TRUE, ..., validate = FALSE)
source |
character. Path, URI, or reference store locator. |
verbose |
logical. Emit progress and diagnostic messages (default TRUE).
Suppress with |
... |
additional arguments passed to Arrow filesystem constructors
(e.g. |
validate |
should the store be checked for valid metadata, |
For S3 stores, if Arrow returns a 301 redirect (wrong region), zaro will
fall through to gdalraster's /vsis3/ which handles region detection
automatically.
Known public buckets (e.g. cmip6, aodn-cloud-optimised) are accessed
anonymously by default. For other public stores, pass
anonymous = TRUE.
Validation of the store may be done by setting validate = TRUE, but this is otherwise
avoided at open time by default (source is checked for being character, non- empty string).
A store object (internal class) for use with other zaro functions.
## Not run: # local store store <- zaro("/data/my_dataset.zarr") # S3 store (anonymous) store <- zaro("s3://mur-sst/zarr-v1", anonymous = TRUE) # GCS — known public bucket, anonymous auto-detected store <- zaro("gs://cmip6/CMIP6/ScenarioMIP/NOAA-GFDL/GFDL-ESM4/ssp585/r1i1p1f1/Omon/zos/gn/v20180701") # VirtualiZarr Parquet reference store <- zaro("virtualizarr://https://example.com/dataset.parq") # then explore zaro_list(store) zaro_meta(store, "analysed_sst") data <- zaro_read(store, "analysed_sst", start = c(0, 0, 0), count = c(1, 100, 100)) ## End(Not run)## Not run: # local store store <- zaro("/data/my_dataset.zarr") # S3 store (anonymous) store <- zaro("s3://mur-sst/zarr-v1", anonymous = TRUE) # GCS — known public bucket, anonymous auto-detected store <- zaro("gs://cmip6/CMIP6/ScenarioMIP/NOAA-GFDL/GFDL-ESM4/ssp585/r1i1p1f1/Omon/zos/gn/v20180701") # VirtualiZarr Parquet reference store <- zaro("virtualizarr://https://example.com/dataset.parq") # then explore zaro_list(store) zaro_meta(store, "analysed_sst") data <- zaro_read(store, "analysed_sst", start = c(0, 0, 0), count = c(1, 100, 100)) ## End(Not run)
Fetches and decodes one chunk, identified by its position in the chunk grid. Returns the decoded values with metadata about where the chunk sits in the array.
zaro_chunk(store, path, chunk_idx, meta = NULL, shaped = TRUE, verbose = TRUE)zaro_chunk(store, path, chunk_idx, meta = NULL, shaped = TRUE, verbose = TRUE)
store |
a store object from |
path |
character. Path to the array. |
chunk_idx |
integer vector. 0-based chunk grid index
(e.g. |
meta |
optional pre-fetched ZaroMeta (or root group meta). |
shaped |
logical. If TRUE (default), reshape the values to the chunk's actual dimensions (handling C-order and edge chunks). If FALSE, return a flat vector. |
verbose |
logical. |
A list with components:
integer vector — the chunk grid index
array or vector — decoded chunk data
integer vector — 0-based array coordinates of chunk origin
integer vector — actual shape of this chunk (may be smaller than chunk_shape for edge chunks)
Returns NULL if the chunk doesn't exist (missing/sparse).
Iterates over chunks and applies a function to each, accumulating results. Chunks are fetched and decoded one at a time (or in parallel) so memory usage is proportional to one chunk, not the full array.
zaro_chunk_apply( store, path, fun, ranges = NULL, meta = NULL, shaped = TRUE, parallel = FALSE, verbose = TRUE )zaro_chunk_apply( store, path, fun, ranges = NULL, meta = NULL, shaped = TRUE, parallel = FALSE, verbose = TRUE )
store |
a store object from |
path |
character. Path to the array. |
fun |
function. Called with a single chunk result list (as from
|
ranges |
optional range specification (as in |
meta |
optional pre-fetched ZaroMeta (or root group meta). |
shaped |
logical. If TRUE, reshape each chunk's values. |
parallel |
logical. Use future.apply for parallel execution. |
verbose |
logical. |
A list of results from fun, one per non-NULL chunk.
## Not run: # per-chunk mean, memory-efficient means <- zaro_chunk_apply(store, "temperature", function(chunk) { mean(chunk$values, na.rm = TRUE) }) # first time step only, all spatial chunks means <- zaro_chunk_apply(store, "temperature", function(chunk) { c(mean = mean(chunk$values, na.rm = TRUE), n_valid = sum(is.finite(chunk$values))) }, ranges = list(0, NULL, NULL)) ## End(Not run)## Not run: # per-chunk mean, memory-efficient means <- zaro_chunk_apply(store, "temperature", function(chunk) { mean(chunk$values, na.rm = TRUE) }) # first time step only, all spatial chunks means <- zaro_chunk_apply(store, "temperature", function(chunk) { c(mean = mean(chunk$values, na.rm = TRUE), n_valid = sum(is.finite(chunk$values))) }, ranges = list(0, NULL, NULL)) ## End(Not run)
Returns the chunk grid dimensions and related metadata needed for chunk-indexed operations.
zaro_chunk_info(store, path, meta = NULL, verbose = TRUE)zaro_chunk_info(store, path, meta = NULL, verbose = TRUE)
store |
a store object from |
path |
character. Path to the array. |
meta |
optional pre-fetched ZaroMeta (or root group meta). |
verbose |
logical. |
A list with components:
integer vector — number of chunks along each dimension
integer vector — shape of a full (non-edge) chunk
integer vector — shape of the full array
integer — total number of chunks
character vector or NULL
ZaroMeta for the array
Fetches and decodes a set of chunks identified by ranges along each dimension of the chunk grid. Supports parallel fetching.
zaro_chunks( store, path, ranges = NULL, meta = NULL, shaped = TRUE, parallel = FALSE, verbose = TRUE )zaro_chunks( store, path, ranges = NULL, meta = NULL, shaped = TRUE, parallel = FALSE, verbose = TRUE )
store |
a store object from |
path |
character. Path to the array. |
ranges |
a list of integer vectors, one per dimension, giving the
0-based chunk indices to read. Use |
meta |
optional pre-fetched ZaroMeta (or root group meta). |
shaped |
logical. If TRUE, reshape each chunk's values. |
parallel |
logical. Use future.apply for parallel fetch+decode. |
verbose |
logical. |
A list of chunk results (as from zaro_chunk()). NULL
entries indicate missing chunks.
List contents of a Zarr store
zaro_list(store, path = "", ...)zaro_list(store, path = "", ...)
store |
a store object from |
path |
character. Path prefix within the store (default root). |
... |
arguments for methods (specifically 'recursive = FALSE' is default for 'ArrowStore') |
character vector of keys (relative paths).
Reads and parses the metadata for an array or group at the given path.
Automatically detects Zarr V2 (.zarray/.zattrs/.zmetadata)
vs V3 (zarr.json) format. For cloud stores, consolidated metadata
(.zmetadata for V2, consolidated zarr.json for V3) is preferred
to minimise the number of requests.
zaro_meta(store, path = "", consolidated = TRUE, verbose = TRUE)zaro_meta(store, path = "", consolidated = TRUE, verbose = TRUE)
store |
a store object from |
path |
character. Path to the array or group within the store.
Use |
consolidated |
logical. If |
verbose |
logical. Emit diagnostic messages (default TRUE). |
A ZaroMeta object. For root group requests with consolidated
metadata, the individual array metadata are attached as
attr(, "consolidated").
Reads a hyperslab from a Zarr array, fetching and decoding the necessary chunks and assembling them into an R array. Supports both Zarr V2 and V3.
zaro_read( store, path = "", start = NULL, count = NULL, meta = NULL, parallel = FALSE, verbose = TRUE, assemble = TRUE )zaro_read( store, path = "", start = NULL, count = NULL, meta = NULL, parallel = FALSE, verbose = TRUE, assemble = TRUE )
store |
a store object from |
path |
character. Path to the array within the store. |
start |
integer vector. 0-based start indices for each dimension. Defaults to the origin. |
count |
integer vector. Number of elements to read along each
dimension. Use |
meta |
optional pre-fetched ZaroMeta object. If NULL (default),
metadata is read from the store. Can also be the root group metadata
from |
parallel |
logical. If |
verbose |
logical. Emit diagnostic messages (default TRUE). |
assemble |
logical. If |
An R array with dimensions matching count.