Select and View Data#

Datasets#

In PRIMAP2, data is handled in xarray datasets with defined dimensions, coordinates and metadata. If you are not familiar with selecting data from using xarray, we recommend reading the corresponding section in xarray’s documentation first.

To get going, we will show the most important features of the data format using a toy example.

import primap2
import primap2.tests

ds = primap2.tests.examples.toy_ds()

ds

<xarray.Dataset> Size: 3kB
Dimensions:              (time: 6, area (ISO3): 2, category (IPCC2006): 5,
                          source: 2)
Coordinates:
  * time                 (time) datetime64[ns] 48B 2015-01-01 ... 2020-01-01
  * area (ISO3)          (area (ISO3)) <U3 24B 'COL' 'ARG'
  * category (IPCC2006)  (category (IPCC2006)) <U3 60B '0' '1' '2' '1.A' '1.B'
  * source               (source) <U8 64B 'RAND2020' 'RAND2021'
Data variables:
    CO2                  (time, area (ISO3), category (IPCC2006), source) float64 960B [CO2·Gg/a] ...
    CH4                  (time, area (ISO3), category (IPCC2006), source) float64 960B [CH4·Gg/a] ...
    CH4 (SARGWP100)      (time, area (ISO3), category (IPCC2006), source) float64 960B [CO2·Gg/a] ...
Attributes:
    area:     area (ISO3)
    cat:      category (IPCC2006)

Magnitude	[[[[0.5118216247002567 0.9504636963259353] [0.14415961271963373 0.9486494471372439] [0.31183145201048545 0.42332644897257565] [0.8277025938204418 0.4091991363691613] [0.5495936876730595 0.027559113243068367]] [[0.7535131086748066 0.5381433132192782] [0.32973171649909216 0.7884287034284043] [0.303194829291645 0.4534978894806515] [0.13404169724716475 0.40311298644712923] [0.20345524067614962 0.2623133404418495]]] [[[0.7503646726300526 0.2804087579860399] [0.48519097443163506 0.9807371998012386] [0.9616571936637868 0.7247899407735336] [0.5412268555474342 0.2768912040453708] [0.16065200877512686 0.9699254132161326]] [[0.5160685855478787 0.11586561247077032] [0.6234897555375004 0.776683114342298] [0.6130033010530405 0.9172977047909027] [0.03959287666420286 0.5285892632600216] [0.4593358828854037 0.0623495791498756]]] [[[0.641328169139375 0.8526328384806567] [0.592941018104284 0.2600974477372232] [0.8398815210314088 0.5094958815215094] [0.510888884466533 0.7530302077021779] [0.14792203578495655 0.819626719119277]] [[0.6832869060032571 0.787096941554801] [0.19161625902013524 0.80236416113453] [0.19132392605720028 0.08155261736351271] [0.8552269742870702 0.8612834961776684] [0.8765370964165805 0.4719097193587902]]] [[[0.2740483886137183 0.007091828603166261] [0.6457208955749478 0.719909383508693] [0.8355692165002742 0.28187782736454214] [0.2152181671629736 0.6393313800665879] [0.8050548331450097 0.9636708728449709]] [[0.15052483042117748 0.48221238819933654] [0.8947158621961735 0.4227169069454373] [0.5895020620840481 0.0244906774933632] [0.6734598871529389 0.9190886196338225] [0.8268253295567211 0.8855202667099468]]] [[[0.6603553805205233 0.24555226724317758] [0.7685169988962544 0.2116747426075105] [0.8312748346644612 0.06271792257076825] [0.8254878133935558 0.1645072664741013] [0.37514699649664185 0.3167381665569643]] [[0.6913370352777413 0.17857187817437192] [0.39625616221698645 0.0058245951079809455] [0.2624947127501015 0.42118881422895527] [0.10592123670732445 0.6331599460365578] [0.38042426988653233 0.7252939380762389]]] [[[0.6538660110683944 0.4312267487774062] [0.8673205056421992 0.632135117500167] [0.8102743521062991 0.341794723940113] [0.5436692896684556 0.1962968851147534] [0.9961411901186279 0.24321546430632712]] [[0.25686746722710274 0.07319007239096598] [0.2578031189967366 0.7631285325440532] [0.6978935706830813 0.12867321231716944] [0.37623850142809423 0.4209213946174629] [0.6649842463619607 0.45592896304374886]]]]
Units	CO2 gigagram/year

Magnitude	[[[[0.5865183268255314 0.8396846036089424] [0.7264736103123705 0.36500726350855894] [0.44839630934448427 0.3676995696900066] [0.10973466400669674 0.2032415440873966] [0.2838064889441311 0.3141338956023022]] [[0.3130478588199377 0.576699716252952] [0.9716899756197547 0.774664134923732] [0.7911339481728052 0.75926850053958] [0.5969877305237564 0.9176922571709127] [0.689630155447081 0.500356430736871]]] [[[0.07708380850053875 0.48844922708552385] [0.21283099534033434 0.13269629754678725] [0.506064922529373 0.785085292596959] [0.29500644280551946 0.7687717599091665] [0.5256295231622541 0.14904802337071255]] [[0.9649677439797357 0.4016362238885175] [0.2952342556626958 0.8469983706337296] [0.12446033251547983 0.7335904610737034] [0.18782474256546833 0.39249177601258245] [0.23189987846213844 0.8412279926923869]]] [[[0.39007455193986174 0.9746928128822893] [0.6252614844151068 0.6936228347029152] [0.5215251221324175 0.30896819907559114] [0.3955564210524287 0.9409341876619017] [0.20120320072466447 0.9882189012202123]] [[0.7583058620038184 0.35978692649261024] [0.6415135895056033 0.38098153929308753] [0.38149293263992 0.5038029501945296] [0.016722821635377083 0.4935715599433962] [0.971598413446888 0.28546522878557845]]] [[[0.7482179590766121 0.44278889007049504] [0.20928104261778546 0.9050025708181295] [0.016827284680212995 0.3035089265995107] [0.9990258823239375 0.2621467961899895] [0.849044521859272 0.6056831486557043]] [[0.8060357075271236 0.6303177554434782] [0.3626968570578484 0.7607887845834825] [0.026484548903972338 0.4468129517344519] [0.371854569870106 0.4770740056373495] [0.1276206864960696 0.22250686594627245]]] [[[0.5620515900997094 0.38776911565595396] [0.7916562055903406 0.6051365892775145] [0.8612666847869941 0.7323608373256045] [0.601823447758565 0.2876156021959957] [0.7827604679261685 0.2512675781710818]] [[0.07521111181440443 0.9628645784432278] [0.5400112050964692 0.7738942975498113] [0.5292228076303601 0.6115797303718215] [0.03389225051650624 0.18679367935380853] [0.6746893954347775 0.5705645979524983]]] [[[0.15855503398671267 0.9520292687261848] [0.1543536325474043 0.5103032497386853] [0.14400287501925768 0.7173717261062347] [0.27631301409515274 0.13413397512178327] [0.04598718067051566 0.17483553789762363]] [[0.1917987167235664 0.5369720795717926] [0.4510388861713912 0.9572943672535373] [0.9541513368666424 0.7965461104684304] [0.6715876339807033 0.8450230915683464] [0.9387518284798846 0.022617728887002753]]]]
Units	CH4 gigagram/year

Magnitude	[[[[12.316884863336158 17.63337667578779] [15.25594581655978 7.665152533679737] [9.41632249623417 7.721690963490139] [2.3044279441406315 4.268072425835328] [5.959936267826753 6.596811807648346]] [[6.574005035218692 12.110694041311993] [20.405489488014847 16.267946833398373] [16.613812911628912 15.944638511331178] [12.536742340998885 19.271537400589168] [14.4822332643887 10.50748504547429]]] [[[1.6187599785113138 10.257433768796002] [4.469450902147021 2.786622248482532] [10.627363373116832 16.486791144536138] [6.1951352989159085 16.144206958092497] [11.038219986407334 3.1300084907849635]] [[20.264322623574447 8.434360701658868] [6.199919368916611 17.78696578330832] [2.6136669828250763 15.405399682547772] [3.9443195938748343 8.24232729626423] [4.869897447704907 17.665787846540127]]] [[[8.191565590737097 20.468549070528077] [13.130491172717242 14.566079528761218] [10.952027564780767 6.488332180587414] [8.306684842101003 19.759617940899936] [4.225267215217953 20.752596925624456]] [[15.924423102080189 7.5555254563448155] [13.47178537961767 8.000612325154838] [8.011351585438321 10.579861954085121] [0.35117925434291875 10.365002758811318] [20.40356668238465 5.994769804497147]]] [[[15.712577140608854 9.298566691480396] [4.394901894973494 19.00505398718072] [0.3533729782844729 6.373687458589724] [20.979543528802687 5.50508271998978] [17.82993495904471 12.71934612176979]] [[16.926749858069595 13.236672864313043] [7.616633998214815 15.976564476253134] [0.5561755269834191 9.38307198642349] [7.8089459672722255 10.018554118384339] [2.680034416417462 4.6726441848717215]]] [[[11.803083392093896 8.143151428775033] [16.62478031739715 12.707868374827804] [18.086600380526875 15.379577583837696] [12.638292402929864 6.03992764611591] [16.437969826449535 5.276619141592718]] [[1.579433348102493 20.220156147307783] [11.340235307025852 16.251780248546037] [11.113678960237563 12.84317433780825] [0.711737260846631 3.9226672664299786] [14.168477304130326 11.981856557002464]]] [[[3.329655713720966 19.99261464324988] [3.2414262834954903 10.716368244512392] [3.024060375404411 15.064806248230926] [5.802573295998207 2.8168134775574485] [0.9657307940808287 3.671546295850096]] [[4.027773051194894 11.276413671007644] [9.471816609599214 20.103181712324282] [20.037178074199492 16.727468319837037] [14.103340313594769 17.745484922935276] [19.713788398077575 0.4749723066270578]]]]
Units	CO2 gigagram/year

You can click through the coordinates and variables to check out the contents of the toy dataset. As said, primap2 datasets are xarray datasets, but with clearly defined naming conventions so that the data is self-describing.

Each dataset has a time dimension, an area dimension and a source dimension. In our toy example, we additionally have a category dimension. For the area and category dimensions, the terminology used for the dimension is given in the dimension name in braces, e.g. ISO3 for the area. The terminologies are defined in the separate climate-categories package, so that the meaning of the area codes is clearly defined.

In the dataset are data variables. Each greenhouse gas is in a separate data variable, and if the data variable contains global warming potential equivalent emissions instead of mass emissions, the used metric is given in braces.

Selecting#

Data can be selected using the xarray indexing methods, but PRIMAP2 also provides own versions of some of xarray’s selection methods which are easier to use in the primap2 context.

The loc Indexer#

Similarly, a version of the loc indexer is provided which works with the bare dimension names:

ds.pr.loc[{"time": slice("2016", "2018"), "area": "COL"}]

<xarray.Dataset> Size: 880B
Dimensions:              (time: 3, category (IPCC2006): 5, source: 2)
Coordinates:
  * time                 (time) datetime64[ns] 24B 2016-01-01 ... 2018-01-01
    area (ISO3)          <U3 12B 'COL'
  * category (IPCC2006)  (category (IPCC2006)) <U3 60B '0' '1' '2' '1.A' '1.B'
  * source               (source) <U8 64B 'RAND2020' 'RAND2021'
Data variables:
    CO2                  (time, category (IPCC2006), source) float64 240B [CO2·Gg/a] ...
    CH4                  (time, category (IPCC2006), source) float64 240B [CH4·Gg/a] ...
    CH4 (SARGWP100)      (time, category (IPCC2006), source) float64 240B [CO2·Gg/a] ...
Attributes:
    area:     area (ISO3)
    cat:      category (IPCC2006)

Magnitude	[[[0.7503646726300526 0.2804087579860399] [0.48519097443163506 0.9807371998012386] [0.9616571936637868 0.7247899407735336] [0.5412268555474342 0.2768912040453708] [0.16065200877512686 0.9699254132161326]] [[0.641328169139375 0.8526328384806567] [0.592941018104284 0.2600974477372232] [0.8398815210314088 0.5094958815215094] [0.510888884466533 0.7530302077021779] [0.14792203578495655 0.819626719119277]] [[0.2740483886137183 0.007091828603166261] [0.6457208955749478 0.719909383508693] [0.8355692165002742 0.28187782736454214] [0.2152181671629736 0.6393313800665879] [0.8050548331450097 0.9636708728449709]]]
Units	CO2 gigagram/year

Magnitude	[[[0.07708380850053875 0.48844922708552385] [0.21283099534033434 0.13269629754678725] [0.506064922529373 0.785085292596959] [0.29500644280551946 0.7687717599091665] [0.5256295231622541 0.14904802337071255]] [[0.39007455193986174 0.9746928128822893] [0.6252614844151068 0.6936228347029152] [0.5215251221324175 0.30896819907559114] [0.3955564210524287 0.9409341876619017] [0.20120320072466447 0.9882189012202123]] [[0.7482179590766121 0.44278889007049504] [0.20928104261778546 0.9050025708181295] [0.016827284680212995 0.3035089265995107] [0.9990258823239375 0.2621467961899895] [0.849044521859272 0.6056831486557043]]]
Units	CH4 gigagram/year

Magnitude	[[[1.6187599785113138 10.257433768796002] [4.469450902147021 2.786622248482532] [10.627363373116832 16.486791144536138] [6.1951352989159085 16.144206958092497] [11.038219986407334 3.1300084907849635]] [[8.191565590737097 20.468549070528077] [13.130491172717242 14.566079528761218] [10.952027564780767 6.488332180587414] [8.306684842101003 19.759617940899936] [4.225267215217953 20.752596925624456]] [[15.712577140608854 9.298566691480396] [4.394901894973494 19.00505398718072] [0.3533729782844729 6.373687458589724] [20.979543528802687 5.50508271998978] [17.82993495904471 12.71934612176979]]]
Units	CO2 gigagram/year

Negative Selections#

Using the primap2 loc indexer, you can also use negative selections to select everything but the specified value or values along a dimension:

from primap2 import Not

ds.pr.loc[{"time": slice("2002", "2005"), "cat": Not(["0", "1", "2"])}]

<xarray.Dataset> Size: 112B
Dimensions:              (time: 0, area (ISO3): 2, category (IPCC2006): 2,
                          source: 2)
Coordinates:
  * time                 (time) datetime64[ns] 0B 
  * area (ISO3)          (area (ISO3)) <U3 24B 'COL' 'ARG'
  * category (IPCC2006)  (category (IPCC2006)) <U3 24B '1.A' '1.B'
  * source               (source) <U8 64B 'RAND2020' 'RAND2021'
Data variables:
    CO2                  (time, area (ISO3), category (IPCC2006), source) float64 0B [CO2·Gg/a] ...
    CH4                  (time, area (ISO3), category (IPCC2006), source) float64 0B [CH4·Gg/a] ...
    CH4 (SARGWP100)      (time, area (ISO3), category (IPCC2006), source) float64 0B [CO2·Gg/a] ...
Attributes:
    area:     area (ISO3)
    cat:      category (IPCC2006)

Magnitude	[]
Units	CO2 gigagram/year

Magnitude	[]
Units	CH4 gigagram/year

Magnitude	[]
Units	CO2 gigagram/year

Metadata#

We store metadata about the whole dataset in the attrs of the dataset, and metadata about specific data variables in their respective attrs.

ds.attrs

{'area': 'area (ISO3)', 'cat': 'category (IPCC2006)'}

ds["CH4 (SARGWP100)"].attrs

{'entity': 'CH4', 'gwp_context': 'SARGWP100'}

In our toy example there are only some technical metadata values which are mostly convenient for e.g. accessing the global warming potential metric without resorting to string processing. However, you can also add more information, for example a short description of your dataset in the attribute title:

ds.attrs["title"] = "A toy example dataset which contains random data."

We have standardized names for a few attributes (e.g. title), which can then also be accessed via the pr namespace:

ds.pr.title

'A toy example dataset which contains random data.'

You can find the definition of all standardized attributes at Dataset Attributes.

Unit handling#

PRIMAP2 uses the openscm_units package based on the Pint library together with the pint-xarray library for handling of units.

Unit information#

To access the unit information, you can use the pint accessor on DataArrays provided by pint-xarray:

ds["CH4"].pint.units

CH4 gigagram/year

Simple conversions#

Simple unit conversions are possible using standard Pint functions:

ch4_kt_per_day = ds["CH4"].pint.to("kt CH4 / day")
ch4_kt_per_day.pint.units

CH4 kt/day

CO2 equivalent units and mass units#

To convert mass units (emissions of gases) into global warming potentials in units of equivalent CO2 emissions, you have to specify a global warming potential context (also known as global warming potential metric):

ch4_ar4 = ds["CH4"].pr.convert_to_gwp(gwp_context="AR4GWP100", units="Gg CO2 / year")
# The information about the used GWP context is retained:
ch4_ar4.attrs

{'entity': 'CH4', 'gwp_context': 'AR4GWP100'}

Because the GWP context used for conversion is stored, it is easy to convert back to mass units:

ch4 = ch4_ar4.pr.convert_to_mass()
ch4.attrs

{'entity': 'CH4'}

The stored GWP context can also be used to convert another array using the same context:

ch4_sar = ds["CH4"].pr.convert_to_gwp_like(ds["CH4 (SARGWP100)"])
ch4_sar.attrs

{'entity': 'CH4', 'gwp_context': 'SARGWP100'}

Dropping units#

Sometimes, it is necessary or convenient to drop the units, for example to use arrays as input for external functions which are unit-naive. This can be done safely by first converting to the target unit, then dequantifying the dataset or array:

da_nounits = ds["CH4"].pint.to("Mt CH4 / year").pr.dequantify()
da_nounits.attrs

{'entity': 'CH4', 'units': 'CH4 * megametric_ton / year'}

Note that the units are then stored in the DataArray’s attrs, and can be restored using the xarray.DataArray.pr.quantify() function.

Descriptive statistics#

To get an overview about the missing information in a Dataset or DataArray, you can use the xarray.DataArray.pr.coverage() function. It gives you a summary of the number of non-NaN data points.

To illustrate this, we use an array with missing information:

With this array, we can now obtain coverage statistics along given dimensions:

da.pr.coverage("area")

area (ISO3)
COL    6
ARG    2
MEX    5
Name: coverage, dtype: int64

da.pr.coverage("time", "area")

area (ISO3)	COL	ARG	MEX
time
2000-01-01	1	1	2
2001-01-01	2	0	1
2002-01-01	1	1	2
2003-01-01	2	0	0

For Datasets, you can also specify the “entity” as a coordinate:

ds = primap2.tests.examples._cached_opulent_ds.copy(deep=True)
ds["CO2"].pr.loc[{"product": "milk", "area": ["COL", "MEX"]}].pint.magnitude[:] = np.nan

ds.pr.coverage("product", "entity", "area")

	area (ISO3)	COL	ARG	MEX	BOL
product (FAOSTAT)	entity
milk	CO2	2016	2016	2016	2016
	SF6	2016	2016	2016	2016
	CH4	2016	2016	2016	2016
	SF6 (SARGWP100)	2016	2016	2016	2016
meat	CO2	2016	2016	2016	2016
	SF6	2016	2016	2016	2016
	CH4	2016	2016	2016	2016
	SF6 (SARGWP100)	2016	2016	2016	2016

Select and View Data

Contents