cmemsarco provides cloud-native access to Copernicus Marine Service (CMEMS) Analysis-Ready Cloud-Optimized (ARCO) Zarr datasets. The package builds a catalog of GDAL-ready data source names, letting you go straight from URL to pixels without file downloads, directory listings, format, or tool wrangling.
library(cmemsarco)
# The bundled catalog
cmems_catalog_data
#> # A tibble: 1,731 × 13
#> product_id dataset_version_id timeChunked_url geoChunked_url native_url
#> <chr> <chr> <chr> <chr> <chr>
#> 1 NWSHELF_ANALYSI… cmems_mod_nws_bgc… https://s3.waw… https://s3.wa… https://s…
#> 2 NWSHELF_ANALYSI… cmems_mod_nws_bgc… https://s3.waw… https://s3.wa… https://s…
#> 3 NWSHELF_ANALYSI… cmems_mod_nws_bgc… https://s3.waw… https://s3.wa… https://s…
#> 4 NWSHELF_ANALYSI… cmems_mod_nws_bgc… https://s3.waw… https://s3.wa… https://s…
#> 5 NWSHELF_ANALYSI… cmems_mod_nws_bgc… <NA> <NA> https://s…
#> 6 NWSHELF_ANALYSI… cmems_mod_nws_bgc… <NA> <NA> https://s…
#> 7 NWSHELF_ANALYSI… cmems_mod_nws_bgc… https://s3.waw… https://s3.wa… https://s…
#> 8 NWSHELF_ANALYSI… cmems_mod_nws_bgc… https://s3.waw… https://s3.wa… https://s…
#> 9 NWSHELF_ANALYSI… cmems_mod_nws_bgc… https://s3.waw… https://s3.wa… https://s…
#> 10 NWSHELF_ANALYSI… cmems_mod_nws_bgc… https://s3.waw… https://s3.wa… https://s…
#> # ℹ 1,721 more rows
#> # ℹ 8 more variables: dataset_id <chr>, version <chr>, timeChunked_gdal <chr>,
#> # geoChunked_gdal <chr>, timeChunked_gdals3 <chr>, geoChunked_gdals3 <chr>,
#> # timeChunked_s3 <chr>, geoChunked_s3 <chr>The catalog is built by walking the CMEMS STAC API. Each row represents a versioned dataset with URLs to Zarr stores in different formats:
| Column | Description |
|---|---|
product_id |
CMEMS product identifier |
dataset_id |
Dataset identifier (without version) |
version |
6-digit version (YYYYMM) |
timeChunked_url |
HTTPS URL to timeChunked.zarr |
geoChunked_url |
HTTPS URL to geoChunked.zarr |
*_gdal |
GDAL DSN using /vsicurl/ |
*_gdals3 |
GDAL DSN using /vsis3/ |
*_s3 |
S3 URI (s3://bucket/path) |
Use cmems_latest() to keep only the most recent version
of each dataset, and cmems_arco_only() to drop datasets
without Zarr URLs (static/native-only).
cmems_catalog_data |>
cmems_arco_only() |>
cmems_latest()
#> # A tibble: 164 × 13
#> product_id dataset_version_id timeChunked_url geoChunked_url native_url
#> <chr> <chr> <chr> <chr> <chr>
#> 1 NWSHELF_ANALYSI… cmems_mod_nws_bgc… https://s3.waw… https://s3.wa… https://s…
#> 2 GLOBAL_ANALYSIS… cmems_mod_glo_phy… https://s3.waw… https://s3.wa… https://s…
#> 3 NWSHELF_MULTIYE… cmems_mod_nws_bgc… https://s3.waw… https://s3.wa… https://s…
#> 4 GLOBAL_MULTIYEA… cmems_mod_glo_phy… https://s3.waw… https://s3.wa… https://s…
#> 5 GLOBAL_ANALYSIS… cmems_mod_glo_wav… https://s3.waw… https://s3.wa… https://s…
#> 6 SEALEVEL_GLO_PH… cmems_obs-sl_glo_… https://s3.waw… https://s3.wa… https://s…
#> 7 GLOBAL_MULTIYEA… cmems_mod_glo_bgc… https://s3.waw… https://s3.wa… https://s…
#> 8 NWSHELF_MULTIYE… cmems_mod_nws_phy… https://s3.waw… https://s3.wa… https://s…
#> 9 GLOBAL_MULTIYEA… cmems_mod_glo_wav… https://s3.waw… https://s3.wa… https://s…
#> 10 MEDSEA_ANALYSIS… cmems_mod_med_phy… https://s3.waw… https://s3.wa… https://s…
#> # ℹ 154 more rows
#> # ℹ 8 more variables: dataset_id <chr>, version <chr>, timeChunked_gdal <chr>,
#> # geoChunked_gdal <chr>, timeChunked_gdals3 <chr>, geoChunked_gdals3 <chr>,
#> # timeChunked_s3 <chr>, geoChunked_s3 <chr>CMEMS provides two Zarr stores for each dataset, optimised for different access patterns:
timeChunked (chunks: 1 × 720 × 512 in time × lat × lon)
geoChunked (chunks: 138 × 32 × 64 in time × lat × lon)
Choosing the wrong chunking strategy means many more HTTP requests and slower performance.
Each Zarr store is available in four formats. Use whichever suits your tooling:
*_gdal — zero configuration (recommended)Uses GDAL’s /vsicurl/ handler which works without any
environment setup:
*_gdals3 — S3 protocolUses GDAL’s /vsis3/ handler which requires
cmems_setup() first to configure the AWS endpoint:
cmems_setup() # Sets AWS_NO_SIGN_REQUEST=YES, AWS_S3_ENDPOINT=...
dsn <- cmems_catalog_data$timeChunked_gdals3[1L]
dsn
#> [1] "ZARR:\"/vsis3/mdl-arco-time-041/arco/NWSHELF_ANALYSISFORECAST_BGC_004_002/cmems_mod_nws_bgc_anfc_0.027deg-3D_P1D-m_202411/timeChunked.zarr\""This may offer better performance in some cases due to S3-specific optimisations in GDAL.
*_s3 — S3 URIStandard s3:// URIs for use with S3-aware tools:
*_url — raw HTTPSThe underlying HTTPS URLs, useful if you need to construct your own access pattern:
library(cmemsarco)
# Find your dataset
sla <- cmems_catalog_data |>
dplyr::filter(grepl("SEALEVEL.*NRT", product_id)) |>
cmems_latest()
# Grab the DSN (no setup needed)
dsn <- sla$timeChunked_gdal[1]
dsn
#> [1] "ZARR:\"/vsicurl/https://s3.waw3-1.cloudferro.com/mdl-arco-time-053/arco/SEALEVEL_EUR_PHY_L3_NRT_008_059/cmems_obs-sl_eur_phy-ssh_nrt_swon-l3-duacs_PT1S_202311/timeChunked\""The bundled catalog is a snapshot. To get the latest datasets:
This walks the STAC API and takes a few minutes for all ~330 products.
The CMEMS S3 buckets don’t allow LIST operations, but GDAL’s Zarr
driver doesn’t need them. It reads /.zmetadata to
understand the array structure, then fetches only the chunks required
for your read operation. No directory listings, no full downloads—just
the bytes you need.