Store and load datasets#
The native storage format for primap2 datasets is netcdf, which supports to store all data and metadata in one file, as well as compression. We again use a toy example dataset to show how to store and reload datasets.
import primap2
import primap2.tests
ds = primap2.tests.examples.toy_ds()
ds
<xarray.Dataset> Size: 3kB
Dimensions: (time: 6, area (ISO3): 2, category (IPCC2006): 5,
source: 2)
Coordinates:
* time (time) datetime64[us] 48B 2015-01-01 ... 2020-01-01
* area (ISO3) (area (ISO3)) <U3 24B 'COL' 'ARG'
* category (IPCC2006) (category (IPCC2006)) <U3 60B '0' '1' '2' '1.A' '1.B'
* source (source) <U8 64B 'RAND2020' 'RAND2021'
Data variables:
CO2 (time, area (ISO3), category (IPCC2006), source) float64 960B [CO2·Gg/a] ...
CH4 (time, area (ISO3), category (IPCC2006), source) float64 960B [CH4·Gg/a] ...
CH4 (SARGWP100) (time, area (ISO3), category (IPCC2006), source) float64 960B [CO2·Gg/a] ...
Attributes:
area: area (ISO3)
cat: category (IPCC2006)Store to disk#
Storing a dataset to disk works using the xarray.Dataset.pr.to_netcdf() function.
import tempfile
import pathlib
# setup temporary directory to save things to in this example
with tempfile.TemporaryDirectory() as tdname:
td = pathlib.Path(tdname)
# simple saving without compression
ds.pr.to_netcdf(td / "toy_ds.nc")
# using zlib compression for all gases
compression = {"zlib": True, "complevel": 9}
encoding = {var: compression for var in ds.data_vars}
ds.pr.to_netcdf(td / "toy_ds_compressed.nc", encoding=encoding)
Caution
netcdf files are not reproducible.
netcdf is a very flexible format, which e.g. supports compression using a range
of libraries, therefore the exact same Dataset can be represented by different
netcdf files on disk. Unfortunately, even if you specify the compression options,
netcdf files additionally contain metadata about all software versions used to
produce the file. Therefore, if you reproduce a Dataset containing the same data
and metadata and store it to a netcdf file, it will generally not create a file
which is identical.
Load from disk#
We also provide the function primap2.open_dataset() to load datasets back into memory.
In this example, we load a minimal dataset.
ds = primap2.open_dataset("../minimal_ds.nc")
ds
<xarray.Dataset> Size: 3kB
Dimensions: (time: 21, area (ISO3): 4, source: 1)
Coordinates:
* area (ISO3) (area (ISO3)) <U3 48B 'COL' 'ARG' 'MEX' 'BOL'
* source (source) <U8 32B 'RAND2020'
* time (time) datetime64[ns] 168B 2000-01-01 ... 2020-01-01
Data variables:
CH4 (time, area (ISO3), source) float64 672B [CH4·Gg/a] 0.75...
CO2 (time, area (ISO3), source) float64 672B [CO2·Gg/a] 0.66...
SF6 (time, area (ISO3), source) float64 672B [SF6·Gg/a] 0.00...
SF6 (SARGWP100) (time, area (ISO3), source) float64 672B [CO2·Gg/a] 43.0...
Attributes:
area: area (ISO3)Note how units were read and attributes restored.