Store and load datasets

Store and load datasets#

The native storage format for primap2 datasets is netcdf, which supports to store all data and metadata in one file, as well as compression. We again use a toy example dataset to show how to store and reload datasets.

Hide code cell content
# setup logging for the docs - we don't need debug logs
import sys
from loguru import logger

logger.remove()
logger.add(sys.stderr, level="INFO")
1
import primap2
import primap2.tests

ds = primap2.tests.examples.toy_ds()

ds
<xarray.Dataset> Size: 3kB
Dimensions:              (time: 6, area (ISO3): 2, category (IPCC2006): 5,
                          source: 2)
Coordinates:
  * time                 (time) datetime64[ns] 48B 2015-01-01 ... 2020-01-01
  * area (ISO3)          (area (ISO3)) <U3 24B 'COL' 'ARG'
  * category (IPCC2006)  (category (IPCC2006)) <U3 60B '0' '1' '2' '1.A' '1.B'
  * source               (source) <U8 64B 'RAND2020' 'RAND2021'
Data variables:
    CO2                  (time, area (ISO3), category (IPCC2006), source) float64 960B [CO2·Gg/a] ...
    CH4                  (time, area (ISO3), category (IPCC2006), source) float64 960B [CH4·Gg/a] ...
    CH4 (SARGWP100)      (time, area (ISO3), category (IPCC2006), source) float64 960B [CO2·Gg/a] ...
Attributes:
    area:     area (ISO3)
    cat:      category (IPCC2006)

Store to disk#

Storing a dataset to disk works using the xarray.Dataset.pr.to_netcdf() function.

import tempfile
import pathlib

# setup temporary directory to save things to in this example
with tempfile.TemporaryDirectory() as tdname:
    td = pathlib.Path(tdname)

    # simple saving without compression
    ds.pr.to_netcdf(td / "toy_ds.nc")

    # using zlib compression for all gases
    compression = {"zlib": True, "complevel": 9}
    encoding = {var: compression for var in ds.data_vars}
    ds.pr.to_netcdf(td / "toy_ds_compressed.nc", encoding=encoding)

Caution

netcdf files are not reproducible.

netcdf is a very flexible format, which e.g. supports compression using a range of libraries, therefore the exact same Dataset can be represented by different netcdf files on disk. Unfortunately, even if you specify the compression options, netcdf files additionally contain metadata about all software versions used to produce the file. Therefore, if you reproduce a Dataset containing the same data and metadata and store it to a netcdf file, it will generally not create a file which is identical.

Load from disk#

We also provide the function primap2.open_dataset() to load datasets back into memory. In this example, we load a minimal dataset.

ds = primap2.open_dataset("../minimal_ds.nc")

ds
<xarray.Dataset> Size: 3kB
Dimensions:          (time: 21, area (ISO3): 4, source: 1)
Coordinates:
  * area (ISO3)      (area (ISO3)) <U3 48B 'COL' 'ARG' 'MEX' 'BOL'
  * source           (source) <U8 32B 'RAND2020'
  * time             (time) datetime64[ns] 168B 2000-01-01 ... 2020-01-01
Data variables:
    CH4              (time, area (ISO3), source) float64 672B [CH4·Gg/a] 0.75...
    CO2              (time, area (ISO3), source) float64 672B [CO2·Gg/a] 0.66...
    SF6              (time, area (ISO3), source) float64 672B [SF6·Gg/a] 0.00...
    SF6 (SARGWP100)  (time, area (ISO3), source) float64 672B [CO2·Gg/a] 43.0...
Attributes:
    area:     area (ISO3)

Note how units were read and attributes restored.