Specialized Usage

In this section we present usage examples for functionality which is useful for specific tasks when working with GHG emissions data.

[1]:
import numpy as np
import xarray as xr
import primap2
from primap2 import ureg

# set up logging for the docs - don't show debug messages
import sys
from loguru import logger

logger.remove()
logger.add(sys.stderr, level="INFO")
[1]:
1

Downscaling

To downscale a super-category (for example, regional data) to sub-categories (for example, country-level data in the same region), the downscale_timeseries function is available. It determines shares from available data points, then does downscaling for years where full information is not available:

[2]:
# select an example dataset
da = primap2.open_dataset("minimal_ds.nc")["CO2"].loc[
    {"time": slice("2000", "2003")}
]
da
[2]:
<xarray.DataArray 'CO2' (time: 4, area (ISO3): 4, source: 1)> Size: 128B
<Quantity([[[0.66352018]
  [0.91683183]
  [0.10638745]
  [0.61416341]]

 [[0.33822533]
  [0.68233863]
  [0.73953851]
  [0.79057726]]

 [[0.11409101]
  [0.25072807]
  [0.85044897]
  [0.40873907]]

 [[0.80147707]
  [0.85209597]
  [0.71521289]
  [0.08031265]]], 'CO2 * gigagram / year')>
Coordinates:
  * area (ISO3)  (area (ISO3)) <U3 48B 'COL' 'ARG' 'MEX' 'BOL'
  * source       (source) <U8 32B 'RAND2020'
  * time         (time) datetime64[ns] 32B 2000-01-01 2001-01-01 ... 2003-01-01
Attributes:
    entity:   CO2
[3]:
# compute regional data as sum of country-level data
temp = da.sum(dim="area (ISO3)")
temp = temp.expand_dims({"area (ISO3)": ["LATAM"]})
# delete data from the country level for the years 2002-2003 (inclusive)
da.loc[{"time": slice("2002", "2003")}].pint.magnitude[:] = np.nan
# add regional data to the array
da = xr.concat([da, temp], dim="area (ISO3)")
da
[3]:
<xarray.DataArray 'CO2' (time: 4, area (ISO3): 5, source: 1)> Size: 160B
<Quantity([[[0.66352018]
  [0.91683183]
  [0.10638745]
  [0.61416341]
  [2.30090286]]

 [[0.33822533]
  [0.68233863]
  [0.73953851]
  [0.79057726]
  [2.55067973]]

 [[       nan]
  [       nan]
  [       nan]
  [       nan]
  [1.62400712]]

 [[       nan]
  [       nan]
  [       nan]
  [       nan]
  [2.44909858]]], 'CO2 * gigagram / year')>
Coordinates:
  * area (ISO3)  (area (ISO3)) object 40B 'COL' 'ARG' 'MEX' 'BOL' 'LATAM'
  * source       (source) <U8 32B 'RAND2020'
  * time         (time) datetime64[ns] 32B 2000-01-01 2001-01-01 ... 2003-01-01
Attributes:
    entity:   CO2
[4]:
# Do the downscaling to fill in country-level data from regional data
da.pr.downscale_timeseries(
    basket="LATAM",
    basket_contents=["BOL", "MEX", "COL", "ARG"],
    dim="area (ISO3)",
)
[4]:
<xarray.DataArray 'CO2' (time: 4, area (ISO3): 5, source: 1)> Size: 160B
<Quantity([[[0.66352018]
  [0.91683183]
  [0.10638745]
  [0.61416341]
  [2.30090286]]

 [[0.33822533]
  [0.68233863]
  [0.73953851]
  [0.79057726]
  [2.55067973]]

 [[0.21534665]
  [0.43444215]
  [0.47086107]
  [0.50335724]
  [1.62400712]]

 [[0.32475546]
  [0.65516441]
  [0.71008629]
  [0.75909242]
  [2.44909858]]], 'CO2 * gigagram / year')>
Coordinates:
  * area (ISO3)  (area (ISO3)) object 40B 'COL' 'ARG' 'MEX' 'BOL' 'LATAM'
  * source       (source) <U8 32B 'RAND2020'
  * time         (time) datetime64[ns] 32B 2000-01-01 2001-01-01 ... 2003-01-01
Attributes:
    entity:   CO2

For the downscaling, shares for the sub-categories at the points in time where data for all sub-categories is available are determined, the shares are interpolated where data is missing, and then the super-category is downscaled using these shares.

Handling of gas baskets

Summation

To sum the contents of gas baskets like KYOTOGHG, the function ds.gas_basket_contents_sum is available:

[5]:
# select example dataset
ds = primap2.open_dataset("minimal_ds.nc").loc[
    {"time": slice("2000", "2003")}
][["CH4", "CO2", "SF6"]]
ds
[5]:
<xarray.Dataset> Size: 496B
Dimensions:      (time: 4, area (ISO3): 4, source: 1)
Coordinates:
  * area (ISO3)  (area (ISO3)) <U3 48B 'COL' 'ARG' 'MEX' 'BOL'
  * source       (source) <U8 32B 'RAND2020'
  * time         (time) datetime64[ns] 32B 2000-01-01 2001-01-01 ... 2003-01-01
Data variables:
    CH4          (time, area (ISO3), source) float64 128B [CH4·Gg/a] 0.7543 ....
    CO2          (time, area (ISO3), source) float64 128B [CO2·Gg/a] 0.6635 ....
    SF6          (time, area (ISO3), source) float64 128B [Gg·SF6/a] 0.0018 ....
Attributes:
    area:     area (ISO3)
[6]:
# add (empty) gas basket with corresponding metadata
ds["KYOTOGHG (AR4GWP100)"] = xr.full_like(ds["CO2"], np.nan).pr.quantify(
    units="Gg CO2 / year"
)
ds["KYOTOGHG (AR4GWP100)"].attrs = {"entity": "KYOTOGHG", "gwp_context": "AR4GWP100"}

ds
[6]:
<xarray.Dataset> Size: 624B
Dimensions:               (time: 4, area (ISO3): 4, source: 1)
Coordinates:
  * area (ISO3)           (area (ISO3)) <U3 48B 'COL' 'ARG' 'MEX' 'BOL'
  * source                (source) <U8 32B 'RAND2020'
  * time                  (time) datetime64[ns] 32B 2000-01-01 ... 2003-01-01
Data variables:
    CH4                   (time, area (ISO3), source) float64 128B [CH4·Gg/a] ...
    CO2                   (time, area (ISO3), source) float64 128B [CO2·Gg/a] ...
    SF6                   (time, area (ISO3), source) float64 128B [Gg·SF6/a] ...
    KYOTOGHG (AR4GWP100)  (time, area (ISO3), source) float64 128B [CO2·Gg/a] ...
Attributes:
    area:     area (ISO3)
[7]:
# compute gas basket from its contents, which have to be given explicitly
ds.pr.gas_basket_contents_sum(
    basket="KYOTOGHG (AR4GWP100)",
    basket_contents=["CO2", "SF6", "CH4"],
)
[7]:
<xarray.DataArray 'KYOTOGHG (AR4GWP100)' (time: 4, area (ISO3): 4, source: 1)> Size: 128B
<Quantity([[[   60.55826022]
  [11537.15564864]
  [15441.09112246]
  [ 2832.13708223]]

 [[10913.89923578]
  [12017.91242534]
  [18293.41941003]
  [18461.14522004]]

 [[12164.44190136]
  [14106.2809531 ]
  [ 5472.05066779]
  [ 7054.12526007]]

 [[12163.56107579]
  [11353.23339466]
  [ 9916.31088755]
  [15594.44908077]]], 'CO2 * gigagram / year')>
Coordinates:
  * area (ISO3)  (area (ISO3)) <U3 48B 'COL' 'ARG' 'MEX' 'BOL'
  * source       (source) <U8 32B 'RAND2020'
  * time         (time) datetime64[ns] 32B 2000-01-01 2001-01-01 ... 2003-01-01
Attributes:
    gwp_context:  AR4GWP100
    entity:       KYOTOGHG

Note that like all PRIMAP2 functions, gas_basket_contents_sum returns the result without overwriting anything in the original dataset, so you have to explicitly overwrite existing data if you want that:

[8]:
ds["KYOTOGHG (AR4GWP100)"] = ds.pr.gas_basket_contents_sum(
    basket="KYOTOGHG (AR4GWP100)",
    basket_contents=["CO2", "SF6", "CH4"],
)

Filling in missing information

To fill in missing data in a gas basket, use fill_na_gas_basket_from_contents:

[9]:
# delete all data about the years 2002-2003 (inclusive) from the
# KYOTOGHG data
ds["KYOTOGHG (AR4GWP100)"].loc[{"time": slice("2002", "2003")}].pint.magnitude[
    :
] = np.nan
ds["KYOTOGHG (AR4GWP100)"]
[9]:
<xarray.DataArray 'KYOTOGHG (AR4GWP100)' (time: 4, area (ISO3): 4, source: 1)> Size: 128B
<Quantity([[[   60.55826022]
  [11537.15564864]
  [15441.09112246]
  [ 2832.13708223]]

 [[10913.89923578]
  [12017.91242534]
  [18293.41941003]
  [18461.14522004]]

 [[           nan]
  [           nan]
  [           nan]
  [           nan]]

 [[           nan]
  [           nan]
  [           nan]
  [           nan]]], 'CO2 * gigagram / year')>
Coordinates:
  * area (ISO3)  (area (ISO3)) <U3 48B 'COL' 'ARG' 'MEX' 'BOL'
  * source       (source) <U8 32B 'RAND2020'
  * time         (time) datetime64[ns] 32B 2000-01-01 2001-01-01 ... 2003-01-01
Attributes:
    gwp_context:  AR4GWP100
    entity:       KYOTOGHG
[10]:
ds.pr.fill_na_gas_basket_from_contents(
    basket="KYOTOGHG (AR4GWP100)", basket_contents=["CO2", "SF6", "CH4"]
)
[10]:
<xarray.DataArray 'KYOTOGHG (AR4GWP100)' (time: 4, area (ISO3): 4, source: 1)> Size: 128B
<Quantity([[[   60.55826022]
  [11537.15564864]
  [15441.09112246]
  [ 2832.13708223]]

 [[10913.89923578]
  [12017.91242534]
  [18293.41941003]
  [18461.14522004]]

 [[12164.44190136]
  [14106.2809531 ]
  [ 5472.05066779]
  [ 7054.12526007]]

 [[12163.56107579]
  [11353.23339466]
  [ 9916.31088755]
  [15594.44908077]]], 'CO2 * gigagram / year')>
Coordinates:
  * area (ISO3)  (area (ISO3)) <U3 48B 'COL' 'ARG' 'MEX' 'BOL'
  * source       (source) <U8 32B 'RAND2020'
  * time         (time) datetime64[ns] 32B 2000-01-01 2001-01-01 ... 2003-01-01
Attributes:
    gwp_context:  AR4GWP100
    entity:       KYOTOGHG

The reverse case is that you are missing some data in the timeseries of individual gases and want to fill those in using downscaled data from a gas basket. Here, use downscale_gas_timeseries:

[11]:
# delete all data about the years 2005-2009 from the individual gas data
sel = {"time": slice("2002", "2003")}
ds["CO2"].loc[sel].pint.magnitude[:] = np.nan
ds["SF6"].loc[sel].pint.magnitude[:] = np.nan
ds["CH4"].loc[sel].pint.magnitude[:] = np.nan
ds
[11]:
<xarray.Dataset> Size: 624B
Dimensions:               (time: 4, area (ISO3): 4, source: 1)
Coordinates:
  * area (ISO3)           (area (ISO3)) <U3 48B 'COL' 'ARG' 'MEX' 'BOL'
  * source                (source) <U8 32B 'RAND2020'
  * time                  (time) datetime64[ns] 32B 2000-01-01 ... 2003-01-01
Data variables:
    CH4                   (time, area (ISO3), source) float64 128B [CH4·Gg/a] ...
    CO2                   (time, area (ISO3), source) float64 128B [CO2·Gg/a] ...
    SF6                   (time, area (ISO3), source) float64 128B [Gg·SF6/a] ...
    KYOTOGHG (AR4GWP100)  (time, area (ISO3), source) float64 128B [CO2·Gg/a] ...
Attributes:
    area:     area (ISO3)
[12]:
# This determines gas shares at the points in time where individual gas
# data is available, interpolates the shares where data is missing, and
# then downscales the gas basket data using the interpolated shares
ds.pr.downscale_gas_timeseries(
    basket="KYOTOGHG (AR4GWP100)", basket_contents=["CO2", "SF6", "CH4"]
)
[12]:
<xarray.Dataset> Size: 624B
Dimensions:               (time: 4, area (ISO3): 4, source: 1)
Coordinates:
  * area (ISO3)           (area (ISO3)) <U3 48B 'COL' 'ARG' 'MEX' 'BOL'
  * source                (source) <U8 32B 'RAND2020'
  * time                  (time) datetime64[ns] 32B 2000-01-01 ... 2003-01-01
Data variables:
    CH4                   (time, area (ISO3), source) float64 128B [CH4·Gg/a] ...
    CO2                   (time, area (ISO3), source) float64 128B [CO2·Gg/a] ...
    SF6                   (time, area (ISO3), source) float64 128B [Gg·SF6/a] ...
    KYOTOGHG (AR4GWP100)  (time, area (ISO3), source) float64 128B [CO2·Gg/a] ...
Attributes:
    area:     area (ISO3)

Creating composite datasets

The primap2.csg module can be used to create a composite dataset from multiple source datasets using specified rules.

The general strategy for combining datasets is to always treat a single timeseries, i.e. an array with only the time as dimension. For each timeseries, the available source timeseries are ordered according to defined priorities, and the result timeseries is initialized from the highest-priority timeseries. Then, lower-priority source timeseries are used in turn to fill any missing information in the result timeseries, one source timeseries at a time. For filling the missing information, a strategy (such as direct substitution or least-squares matching of data) is selected for each source timeseries as configured. When no missing information is left in the result timeseries, the algorithm terminates. It also terminates if all source timeseries are used, even if missing information is left.

The core function to use is the primap2.csg.compose function. It needs the following input:

  • The input dataset, containing all sources. The shape and dimensions of the input dataset also determine the shape and dimensions of the composed dataset.

  • A definition of priority dimensions and priorities. The priority dimensions are the dimensions in the input dataset which will be used to select source datasets. The result dataset will not have the priority dimensions as dimensions any more, because along these dimensions, the source timeseries will be combined into a single composite timeseries. The priorities are a list of selections which have to specify exactly one value for each priority dimension, so that priorities are clearly defined. You can specify values for other dimensions than the priority dimensions, e.g. if you want to change the priorities for some countries or categories. You can also specify exclusions from either the result or input datasets to skip specific sources or categories.

  • A definition of strategies. Using selectors along any input dataset dimensions, it is possible to define filling strategies to use. For each timeseries, a filling strategy has to be specified, so it is a good idea to define a default filling strategy using an empty selector (see example below).

[13]:
import primap2.csg

input_ds = primap2.open_dataset("opulent_ds.nc")[["CH4", "CO2", "SF6"]]
input_ds["CH4"].loc[{
    "category (IPCC 2006)": "1",
    "time": slice("2000", "2001"),
    "scenario (FAOSTAT)": "lowpop"
}][:] = np.nan * ureg("Gg CH4 / year")
input_ds
[13]:
<xarray.Dataset> Size: 388kB
Dimensions:               (time: 21, area (ISO3): 4, category (IPCC 2006): 8,
                           animal (FAOSTAT): 3, product (FAOSTAT): 2,
                           scenario (FAOSTAT): 2, provenance: 1, model: 1,
                           source: 2)
Coordinates:
  * animal (FAOSTAT)      (animal (FAOSTAT)) <U5 60B 'cow' 'swine' 'goat'
  * area (ISO3)           (area (ISO3)) <U3 48B 'COL' 'ARG' 'MEX' 'BOL'
  * category (IPCC 2006)  (category (IPCC 2006)) <U3 96B '0' '1' ... '1.A' '1.B'
    category_names        (category (IPCC 2006)) <U14 448B 'total' ... 'light...
  * model                 (model) <U8 32B 'FANCYFAO'
  * product (FAOSTAT)     (product (FAOSTAT)) <U4 32B 'milk' 'meat'
  * provenance            (provenance) <U9 36B 'projected'
  * scenario (FAOSTAT)    (scenario (FAOSTAT)) <U7 56B 'highpop' 'lowpop'
  * source                (source) <U8 64B 'RAND2020' 'RAND2021'
  * time                  (time) datetime64[ns] 168B 2000-01-01 ... 2020-01-01
Data variables:
    CH4                   (time, area (ISO3), category (IPCC 2006), animal (FAOSTAT), product (FAOSTAT), scenario (FAOSTAT), provenance, model, source) float64 129kB [CH4·Gg/a] ...
    CO2                   (time, area (ISO3), category (IPCC 2006), animal (FAOSTAT), product (FAOSTAT), scenario (FAOSTAT), provenance, model, source) float64 129kB [CO2·Gg/a] ...
    SF6                   (time, area (ISO3), category (IPCC 2006), animal (FAOSTAT), product (FAOSTAT), scenario (FAOSTAT), provenance, model, source) float64 129kB [Gg·SF6/a] ...
Attributes:
    area:                area (ISO3)
    cat:                 category (IPCC 2006)
    comment:             GHG inventory data ...
    contact:             lol_no_one_will_answer@example.com
    entity_terminology:  primap2
    history:             2021-01-14 14:50 data invented\n2021-01-14 14:51 add...
    institution:         PIK
    references:          doi:10.1012
    rights:              Use however you want.
    scen:                scenario (FAOSTAT)
    sec_cats:            ['animal (FAOSTAT)', 'product (FAOSTAT)']
    title:               Completely invented GHG inventory data
[14]:
priority_definition = primap2.csg.PriorityDefinition(
    priority_dimensions=["source", "scenario (FAOSTAT)"],
    priorities=[
        # only applies to category 0: prefer highpop
        {"category (IPCC 2006)": "0", "source": "RAND2020", "scenario (FAOSTAT)": "highpop"},
        {"source": "RAND2020", "scenario (FAOSTAT)": "lowpop"},
        {"source": "RAND2020", "scenario (FAOSTAT)": "highpop"},
        {"source": "RAND2021", "scenario (FAOSTAT)": "lowpop"},
        # the RAND2021, highpop combination is not used at all - you don't have to use all source timeseries
    ],
    # category 5 is not defined for CH4 in this example, so we skip processing it
    # altogether
    exclude_result=[{"entity": "CH4", "category (IPCC 2006)": "5"}],
    # in this example, we know that COL has reported wrong data in the RAND2020 source
    # for SF6 category 1, so we exclude it from processing, it will be skipped and the
    # other data sources will be used as configured in the priorities instead.
    exclude_input=[
        {"entity": "SF6", "category (IPCC 2006)": "1", "area (ISO3)": "COL", "source": "RAND2020"}
    ]
)
[15]:
# Currently, there are only two strategies implemented. The GlobalLSStrategy
# uses a global least square fit to shift and scale the lower priority time
# series to match the higher priority time series. We use this as the main
# filling strategy and thus put it in the first place.
# It can not work in all cases (e.g. when there is no overlap between the
# time-series), thus we add the SubstitutionStrategy as a fallback.

# As we use the same strategies for all time-series we use the empty selector {},
# which matches everything, to configure to use the GlobalLSStrategy and as a
# fallback the SubstitutionStrategy for all timeseries.
strategy_definition = primap2.csg.StrategyDefinition(
    strategies=[
        ({}, primap2.csg.GlobalLSStrategy()),
        ({}, primap2.csg.SubstitutionStrategy())
    ]
)
[16]:
result_ds = primap2.csg.compose(
    input_data=input_ds,
    priority_definition=priority_definition,
    strategy_definition=strategy_definition,
    progress_bar=None,  # The animated progress bar is useless in a notebook
)
/home/docs/checkouts/readthedocs.org/user_builds/primap2/checkouts/main/primap2/csg/_models.py:298: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
  applicable_dimensions = set(ds.dims.keys()).union({"entity", "variable"})
[17]:
result_ds
[17]:
<xarray.Dataset> Size: 102kB
Dimensions:               (time: 21, area (ISO3): 4, category (IPCC 2006): 8,
                           animal (FAOSTAT): 3, product (FAOSTAT): 2,
                           provenance: 1, model: 1)
Coordinates:
  * animal (FAOSTAT)      (animal (FAOSTAT)) <U5 60B 'cow' 'swine' 'goat'
  * area (ISO3)           (area (ISO3)) <U3 48B 'COL' 'ARG' 'MEX' 'BOL'
  * category (IPCC 2006)  (category (IPCC 2006)) <U3 96B '0' '1' ... '1.A' '1.B'
    category_names        (category (IPCC 2006)) <U14 448B 'total' ... 'light...
  * model                 (model) <U8 32B 'FANCYFAO'
  * product (FAOSTAT)     (product (FAOSTAT)) <U4 32B 'milk' 'meat'
  * provenance            (provenance) <U9 36B 'projected'
  * time                  (time) datetime64[ns] 168B 2000-01-01 ... 2020-01-01
Data variables:
    CH4                   (time, area (ISO3), category (IPCC 2006), animal (FAOSTAT), product (FAOSTAT), provenance, model) float64 32kB [CH4·Gg/a] ...
    Processing of CH4     (area (ISO3), category (IPCC 2006), animal (FAOSTAT), product (FAOSTAT), provenance, model) object 2kB ...
    CO2                   (time, area (ISO3), category (IPCC 2006), animal (FAOSTAT), product (FAOSTAT), provenance, model) float64 32kB [CO2·Gg/a] ...
    Processing of CO2     (area (ISO3), category (IPCC 2006), animal (FAOSTAT), product (FAOSTAT), provenance, model) object 2kB ...
    SF6                   (time, area (ISO3), category (IPCC 2006), animal (FAOSTAT), product (FAOSTAT), provenance, model) float64 32kB [Gg·SF6/a] ...
    Processing of SF6     (area (ISO3), category (IPCC 2006), animal (FAOSTAT), product (FAOSTAT), provenance, model) object 2kB ...
Attributes:
    area:                area (ISO3)
    cat:                 category (IPCC 2006)
    comment:             GHG inventory data ...
    contact:             lol_no_one_will_answer@example.com
    entity_terminology:  primap2
    history:             2021-01-14 14:50 data invented\n2021-01-14 14:51 add...
    institution:         PIK
    references:          doi:10.1012
    rights:              Use however you want.
    sec_cats:            ['animal (FAOSTAT)', 'product (FAOSTAT)']
    title:               Completely invented GHG inventory data

In the result, you can see that the priority dimensions have been removed, and there are new data variables “Processing of $entity” added which contain detailed information for each timeseries how it was derived.

[18]:
sel = {"animal": "cow",
       "category": ["0", "1"],
       "product": "milk",
       "time": slice("2000", "2002"), "area": "MEX"}
result_ds["CH4"].pr.loc[sel]
[18]:
<xarray.DataArray 'CH4' (time: 3, category (IPCC 2006): 2, provenance: 1,
                         model: 1)> Size: 48B
<Quantity([[[[0.36864371]]

  [[0.62681848]]]


 [[[0.41488627]]

  [[0.39333567]]]


 [[[0.06242199]]

  [[0.85542488]]]], 'CH4 * gigagram / year')>
Coordinates:
    animal (FAOSTAT)      <U5 20B 'cow'
    area (ISO3)           <U3 12B 'MEX'
  * category (IPCC 2006)  (category (IPCC 2006)) <U3 24B '0' '1'
    category_names        (category (IPCC 2006)) <U14 112B 'total' 'industry'
  * model                 (model) <U8 32B 'FANCYFAO'
    product (FAOSTAT)     <U4 16B 'milk'
  * provenance            (provenance) <U9 36B 'projected'
  * time                  (time) datetime64[ns] 24B 2000-01-01 ... 2002-01-01
Attributes:
    entity:   CH4
[19]:
del sel["time"]
result_ds["Processing of CH4"].pr.loc[sel]
[19]:
<xarray.DataArray 'Processing of CH4' (category (IPCC 2006): 2, provenance: 1,
                                       model: 1)> Size: 16B
array([[[TimeseriesProcessingDescription(steps=[ProcessingStepDescription(time='all', function='compose_timeseries', description="strategy globalLS unable to process {'source': 'RAND2020', 'scenario (FAOSTAT)': 'highpop'}, skipping to next strategy", source="{'source': 'RAND2020', 'scenario (FAOSTAT)': 'highpop'}"), ProcessingStepDescription(time='all', function='substitution', description="substituted with corresponding values from {'source': 'RAND2020', 'scenario (FAOSTAT)': 'highpop'}", source="{'source': 'RAND2020', 'scenario (FAOSTAT)': 'highpop'}")])]],

       [[TimeseriesProcessingDescription(steps=[ProcessingStepDescription(time='all', function='compose_timeseries', description="strategy globalLS unable to process {'source': 'RAND2020', 'scenario (FAOSTAT)': 'lowpop'}, skipping to next strategy", source="{'source': 'RAND2020', 'scenario (FAOSTAT)': 'lowpop'}"), ProcessingStepDescription(time=array(['2002-01-01T00:00:00.000000000', '2003-01-01T00:00:00.000000000',
                '2004-01-01T00:00:00.000000000', '2005-01-01T00:00:00.000000000',
                '2006-01-01T00:00:00.000000000', '2007-01-01T00:00:00.000000000',
                '2008-01-01T00:00:00.000000000', '2009-01-01T00:00:00.000000000',
                '2010-01-01T00:00:00.000000000', '2011-01-01T00:00:00.000000000',
                '2012-01-01T00:00:00.000000000', '2013-01-01T00:00:00.000000000',
                '2014-01-01T00:00:00.000000000', '2015-01-01T00:00:00.000000000',
                '2016-01-01T00:00:00.000000000', '2017-01-01T00:00:00.000000000',
                '2018-01-01T00:00:00.000000000', '2019-01-01T00:00:00.000000000',
                '2020-01-01T00:00:00.000000000'], dtype='datetime64[ns]'), function='substitution', description="substituted with corresponding values from {'source': 'RAND2020', 'scenario (FAOSTAT)': 'lowpop'}", source="{'source': 'RAND2020', 'scenario (FAOSTAT)': 'lowpop'}"), ProcessingStepDescription(time=array(['2000-01-01T00:00:00.000000000', '2001-01-01T00:00:00.000000000'],
               dtype='datetime64[ns]'), function='globalLS', description="filled with least squares matched data from {'source': 'RAND2020', 'scenario (FAOSTAT)': 'highpop'}. a*x+b with a=-0.470, b=0.744", source="{'source': 'RAND2020', 'scenario (FAOSTAT)': 'highpop'}")])                                                                                                                                                  ]]],
      dtype=object)
Coordinates:
    animal (FAOSTAT)      <U5 20B 'cow'
    area (ISO3)           <U3 12B 'MEX'
  * category (IPCC 2006)  (category (IPCC 2006)) <U3 24B '0' '1'
    category_names        (category (IPCC 2006)) <U14 112B 'total' 'industry'
  * model                 (model) <U8 32B 'FANCYFAO'
    product (FAOSTAT)     <U4 16B 'milk'
  * provenance            (provenance) <U9 36B 'projected'
Attributes:
    entity:              Processing of CH4
    described_variable:  CH4
[20]:
for tpd in result_ds["Processing of CH4"].pr.loc[sel]:
    print(f"category={tpd['category (IPCC 2006)'].item()}")
    print(str(tpd.item()))
    print()
category=0
Using function=compose_timeseries with source={'source': 'RAND2020', 'scenario (FAOSTAT)': 'highpop'} for times=all: strategy globalLS unable to process {'source': 'RAND2020', 'scenario (FAOSTAT)': 'highpop'}, skipping to next strategy
Using function=substitution with source={'source': 'RAND2020', 'scenario (FAOSTAT)': 'highpop'} for times=all: substituted with corresponding values from {'source': 'RAND2020', 'scenario (FAOSTAT)': 'highpop'}

category=1
Using function=compose_timeseries with source={'source': 'RAND2020', 'scenario (FAOSTAT)': 'lowpop'} for times=all: strategy globalLS unable to process {'source': 'RAND2020', 'scenario (FAOSTAT)': 'lowpop'}, skipping to next strategy
Using function=substitution with source={'source': 'RAND2020', 'scenario (FAOSTAT)': 'lowpop'} for times=['2002-01-01T00:00:00.000000000' '2003-01-01T00:00:00.000000000'
 '2004-01-01T00:00:00.000000000' '2005-01-01T00:00:00.000000000'
 '2006-01-01T00:00:00.000000000' '2007-01-01T00:00:00.000000000'
 '2008-01-01T00:00:00.000000000' '2009-01-01T00:00:00.000000000'
 '2010-01-01T00:00:00.000000000' '2011-01-01T00:00:00.000000000'
 '2012-01-01T00:00:00.000000000' '2013-01-01T00:00:00.000000000'
 '2014-01-01T00:00:00.000000000' '2015-01-01T00:00:00.000000000'
 '2016-01-01T00:00:00.000000000' '2017-01-01T00:00:00.000000000'
 '2018-01-01T00:00:00.000000000' '2019-01-01T00:00:00.000000000'
 '2020-01-01T00:00:00.000000000']: substituted with corresponding values from {'source': 'RAND2020', 'scenario (FAOSTAT)': 'lowpop'}
Using function=globalLS with source={'source': 'RAND2020', 'scenario (FAOSTAT)': 'highpop'} for times=['2000-01-01T00:00:00.000000000' '2001-01-01T00:00:00.000000000']: filled with least squares matched data from {'source': 'RAND2020', 'scenario (FAOSTAT)': 'highpop'}. a*x+b with a=-0.470, b=0.744

We can see that - as configured - for category 0 “highpop” was preferred, and for category 1 “lowpop” was preferred. For category 0, the initial timeseries did not contain NaNs, so no filling was needed. For category 1, there was information missing in the initial timeseries, so the lower-priority timeseries was used to fill the holes.