This package ncdump
was an early attempt to support a tidyverse-inspired package for R.
The key idea is to integrate interactive exploration of what is in the source with lazy-specification of subset requests - so that a user or developer gets helpers that show:
This would provide facility for any data-read to be “lazy”, delayed
until the last moment at which the choice of output form is made
(long-form data frame, raw array, bespoke format such as
raster
, image
, streaming to another service
on-demand etc.).
https://github.com/hypertidy/tidync
This work needed a systematic “metadata-extraction” language, and
currently ncmeta
/tidync
are the core of that,
wrapping ncdf4
and RNetCDF
and other
exploratory wrappings of rhdf5
and rgdal
for
other cases.
Some poor choices were made in an early version “ncdump” on CRAN
(basically class “NetCDF” already used by RNetCDF
), and so
current direction involves streamlining ncdump
with
ncmeta
then getting the tidync
package onto
CRAN. The partial visibility of groups had also obscured what these
packages were enabling and the required insights to transcend the format
details in specific cases.
Support in R for NetCDF is piecemeal and fragmented. The following sections describe the various facilities of this format and the patchy suppport for them in various R packages.
NetCDF had very large breaking-change update in the move from version 3 to version 4.
The “original” format of NetCDF was pretty straightforward. A source
could have variables, dimensions and attributes. This is well supported
by RNetCDF
and ncdf4
on CRAN, both of which
are provided for multiple architectures (Windows and MacOS). This was
also supported by ncdf
, but that was superseded by
ncdf4
(by the same author) and ncdf
is now
removed from CRAN (end of 2015).
When ncdf
was removed from CRAN the raster
package also updated and removed its references to that package. It had
previously used ncdf4
in preference, deferring to
ncdf
when required i.e. on Windows.
The rgdal
package can include the NetCDF library as a
driver, but no CRAN build has ever done so. Unlike raster
the use of the NetCDF library by GDAL is independent of these other R
packages, and users are expected to build it in if it’s required (true
for many other drivers).
The relationship between raster
and rgdal
is a little complex, since raster
has an independent
interpretation of these sources that uses ncdf4
directly,
but after checking and failing for its own support for a read
raster
will fall back and see if rgdal
can
provide read from a source - but the user cannot request that raster go
via rgdal without masking the ncdf4
package visibility. The
model interpreation provided by raster
and
rgdal
is analogous, but different and independent. They may
“fail to support” a given source for the same broad reason, but the
details can be very different.
This was a complex update to NetCDF, essentially a re-engineering of the library from HDF5. It enabled a number of new facilities:
The ncdf4
package in its original form supported all of
these new features except for compound types, and it also supports the
classic “version 3” forms.
Both raster
and rgdal
support NetCDF in all
cases above for NetCDF version 4 apart from compound types. The
specification of a source within groups is quite specific though and
there’s little exercise of how these packages relate to them. Neither
support “non-regular” non-affine-based georeferencing - both rely on the
rectilinear-axes-coordinate model used by NetCDF being
degenerate-rectilinear - but again the heuristics applied are different
for different sources and so this is a complex area to summarize.
The rhdf5
package supports NetCDF version 4 including
compound types. Specifically, it has a straightforward way to read these
as data frames when it makes sense to do so. There’s no limit on what
NetCDF version 4 can be read, but the interpretation is very much
lower-level than either raster
or rgdal
. This
package is on Bioconductor, so it obscure to the normal CRAN user but it
is supported cross platform. rhdf5
cannot read the classic
form NetCDF version 3 format.
(DODS is the old system, sequentially replaced by OpenDAP and now Thredds - these are synonymous terms as far as I know, but “DODS” is the name of the GDAL driver, for raster and vector sources).
The NetCDF driver can be OpenDAP-aware. The missing OpenDAP support for Windows / MacOS is a lower level shared library issue that is a problem with the Windows ncdf4 and RNetCDF packages as well.
GDAL has an independent driver DODS, but NetCDF itself can also be DODS/OpenDAP capable. Similar overlap occurs with NetCDF(4) and HDF5, and you can see conflicts with raw HTTP sources and these DODS/OpenDAP/Thredds sources because the “same syntax” triggers driver-choice on connect. All driver conflicts within a given GDAL build can be resolved by prepending the driver identifier to the data source string, as far as I know.
Both RNetCDF
and ncdf4
support these server
systems when the library is configured for its support (so usually only
Linux users who can install the requirements). NetCDF can be installed
from source and configured with these options, or installed from distros
- essentially the unstable-ubuntu-gis stack + libdnetcdf-dev is the
simplest way.
Groups are a way to add an extra level to collections of variables within a single data source. It’s like a “group” allows a file to contain more than one file, where “file” corresponds to an available set of dimensions.
Both RNetCDF
and ncdf4
support groups
but neither will list the contents of any group that
contains compound types, so we don’t notice at the R level that the
groups with those types are silently ignored - unless they are the only
type in the file - and we notice because it simply fails to work at
all.
NOTE: I am referring to the current CRAN version of RNetCDF 1.8-2 - the development version on R-forge already has new support for version 4.0 and groups.
http://r-forge.r-project.org/R/?group_id=2008
Supporting groups in full requires a re-write, a super-package to
transcend ncdf4
, RNetCDF
and
rhdf5
- wrappers at the R level could drive these for a
virtual super-package, but it’s complicated by the cross-platform
problems.
Ultimately groups provide a nice analogy for dealing with sets of files, which is a standard model for long-running observations or large model output with long temporal axes. Dealing with this level of hierarchy will enable a true abstraction over these file system artefacts and provide a proper virtual array with database-like support.
R-spatial GDAL: https://github.com/r-spatial/discuss/issues/14
RConsortium wishlist: https://github.com/RConsortium/wishlist/issues/3
netcdf channel on https://ropensci.slack.com