Store and load datasets

Store and load datasets#

The native storage format for primap2 datasets is netcdf, which supports to store all data and metadata in one file, as well as compression. We again use a toy example dataset to show how to store and reload datasets.

Hide code cell content
# setup logging for the docs - we don't need debug logs
import sys
from loguru import logger

logger.remove()
logger.add(sys.stderr, level="INFO")
1
import primap2
import primap2.tests

ds = primap2.tests.examples.toy_ds()

ds
<xarray.Dataset> Size: 3kB
Dimensions:              (time: 6, area (ISO3): 2, category (IPCC2006): 5,
                          source: 2)
Coordinates:
  * time                 (time) datetime64[ns] 48B 2015-01-01 ... 2020-01-01
  * area (ISO3)          (area (ISO3)) <U3 24B 'COL' 'ARG'
  * category (IPCC2006)  (category (IPCC2006)) <U3 60B '0' '1' '2' '1.A' '1.B'
  * source               (source) <U8 64B 'RAND2020' 'RAND2021'
Data variables:
    CO2                  (time, area (ISO3), category (IPCC2006), source) float64 960B [CO2·Gg/a] ...
    CH4                  (time, area (ISO3), category (IPCC2006), source) float64 960B [CH4·Gg/a] ...
    CH4 (SARGWP100)      (time, area (ISO3), category (IPCC2006), source) float64 960B [CO2·Gg/a] ...
Attributes:
    area:     area (ISO3)
    cat:      category (IPCC2006)

Store to disk#

Storing a dataset to disk works using the xarray.Dataset.pr.to_netcdf() function.

import tempfile
import pathlib

# setup temporary directory to save things to in this example
with tempfile.TemporaryDirectory() as tdname:
    td = pathlib.Path(tdname)

    # simple saving without compression
    ds.pr.to_netcdf(td / "toy_ds.nc")

    # using zlib compression for all gases
    compression = {"zlib": True, "complevel": 9}
    encoding = {var: compression for var in ds.data_vars}
    ds.pr.to_netcdf(td / "toy_ds_compressed.nc", encoding=encoding)

Load from disk#

We also provide the function primap2.open_dataset() to load datasets back into memory. In this example, we load a minimal dataset.

ds = primap2.open_dataset("../minimal_ds.nc")

ds
<xarray.Dataset> Size: 3kB
Dimensions:          (time: 21, area (ISO3): 4, source: 1)
Coordinates:
  * area (ISO3)      (area (ISO3)) <U3 48B 'COL' 'ARG' 'MEX' 'BOL'
  * source           (source) <U8 32B 'RAND2020'
  * time             (time) datetime64[ns] 168B 2000-01-01 ... 2020-01-01
Data variables:
    CH4              (time, area (ISO3), source) float64 672B [CH4·Gg/a] 0.75...
    CO2              (time, area (ISO3), source) float64 672B [CO2·Gg/a] 0.66...
    SF6              (time, area (ISO3), source) float64 672B [SF6·Gg/a] 0.00...
    SF6 (SARGWP100)  (time, area (ISO3), source) float64 672B [CO2·Gg/a] 43.0...
Attributes:
    area:     area (ISO3)

Note how units were read and attributes restored.