Store and load datasets#
The native storage format for primap2 datasets is netcdf, which supports to store all data and metadata in one file, as well as compression. We again use a toy example dataset to show how to store and reload datasets.
Logging setup for the docs
# setup logging for the docs - we don't need debug logs
import sys
from loguru import logger
logger.remove()
logger.add(sys.stderr, level="INFO")
1
import primap2
import primap2.tests
ds = primap2.tests.examples.toy_ds()
ds
<xarray.Dataset> Size: 3kB Dimensions: (time: 6, area (ISO3): 2, category (IPCC2006): 5, source: 2) Coordinates: * time (time) datetime64[ns] 48B 2015-01-01 ... 2020-01-01 * area (ISO3) (area (ISO3)) <U3 24B 'COL' 'ARG' * category (IPCC2006) (category (IPCC2006)) <U3 60B '0' '1' '2' '1.A' '1.B' * source (source) <U8 64B 'RAND2020' 'RAND2021' Data variables: CO2 (time, area (ISO3), category (IPCC2006), source) float64 960B [CO2·Gg/a] ... CH4 (time, area (ISO3), category (IPCC2006), source) float64 960B [CH4·Gg/a] ... CH4 (SARGWP100) (time, area (ISO3), category (IPCC2006), source) float64 960B [CO2·Gg/a] ... Attributes: area: area (ISO3) cat: category (IPCC2006)
Store to disk#
Storing a dataset to disk works using the xarray.Dataset.pr.to_netcdf()
function.
import tempfile
import pathlib
# setup temporary directory to save things to in this example
with tempfile.TemporaryDirectory() as tdname:
td = pathlib.Path(tdname)
# simple saving without compression
ds.pr.to_netcdf(td / "toy_ds.nc")
# using zlib compression for all gases
compression = {"zlib": True, "complevel": 9}
encoding = {var: compression for var in ds.data_vars}
ds.pr.to_netcdf(td / "toy_ds_compressed.nc", encoding=encoding)
Caution
netcdf
files are not reproducible.
netcdf
is a very flexible format, which e.g. supports compression using a range
of libraries, therefore the exact same Dataset
can be represented by different
netcdf
files on disk. Unfortunately, even if you specify the compression options,
netcdf
files additionally contain metadata about all software versions used to
produce the file. Therefore, if you reproduce a Dataset
containing the same data
and metadata and store it to a netcdf
file, it will generally not create a file
which is identical.
Load from disk#
We also provide the function primap2.open_dataset()
to load datasets back into memory.
In this example, we load a minimal dataset.
ds = primap2.open_dataset("../minimal_ds.nc")
ds
<xarray.Dataset> Size: 3kB Dimensions: (time: 21, area (ISO3): 4, source: 1) Coordinates: * area (ISO3) (area (ISO3)) <U3 48B 'COL' 'ARG' 'MEX' 'BOL' * source (source) <U8 32B 'RAND2020' * time (time) datetime64[ns] 168B 2000-01-01 ... 2020-01-01 Data variables: CH4 (time, area (ISO3), source) float64 672B [CH4·Gg/a] 0.75... CO2 (time, area (ISO3), source) float64 672B [CO2·Gg/a] 0.66... SF6 (time, area (ISO3), source) float64 672B [SF6·Gg/a] 0.00... SF6 (SARGWP100) (time, area (ISO3), source) float64 672B [CO2·Gg/a] 43.0... Attributes: area: area (ISO3)
Note how units were read and attributes restored.