Downscaling

Downscaling#

To downscale a super-category (for example, regional data) to sub-categories (for example, country-level data in the same region), the xarray.DataArray.pr.downscale_timeseries() function is available. It determines shares from available data points, then does downscaling for years where full information is not available.

Let’s first create an example dataset with regional data and some country data missing.

Hide code cell content
# setup logging for the docs - we don't need debug logs
import sys
from loguru import logger

logger.remove()
logger.add(sys.stderr, level="INFO")
1
import primap2
import numpy as np
import xarray as xr

# select an example dataset
da = primap2.open_dataset("../minimal_ds.nc")["CO2"].loc[{"time": slice("2000", "2003"), "source": "RAND2020"}]
da.pr.to_df()
/home/docs/checkouts/readthedocs.org/user_builds/primap2/envs/stable/lib/python3.12/site-packages/xarray/core/variable.py:341: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  data = np.asarray(data)
area (ISO3) COL ARG MEX BOL
time
2000-01-01 0.663520 0.916832 0.106387 0.614163
2001-01-01 0.338225 0.682339 0.739539 0.790577
2002-01-01 0.114091 0.250728 0.850449 0.408739
2003-01-01 0.801477 0.852096 0.715213 0.080313
# compute regional data as sum of country-level data
temp = da.sum(dim="area (ISO3)")
temp = temp.expand_dims({"area (ISO3)": ["LATAM"]})
# delete data from the country level for the years 2002-2003 (inclusive)
da.loc[{"time": slice("2002", "2003")}].pint.magnitude[:] = np.nan
# add regional data to the array
da = xr.concat([da, temp], dim="area (ISO3)")
da.pr.to_df()
/home/docs/checkouts/readthedocs.org/user_builds/primap2/envs/stable/lib/python3.12/site-packages/xarray/core/variable.py:341: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  data = np.asarray(data)
area (ISO3) COL ARG MEX BOL LATAM
time
2000-01-01 0.663520 0.916832 0.106387 0.614163 2.300903
2001-01-01 0.338225 0.682339 0.739539 0.790577 2.550680
2002-01-01 NaN NaN NaN NaN 1.624007
2003-01-01 NaN NaN NaN NaN 2.449099

As you can see, for 2000 and 2001, country-level data is available, but for later years, only regional (“LATAM”) data is available. We now want to extrapolate the missing data using the shares from early years and the regional data.

# Do the downscaling to fill in country-level data from regional data
da.pr.downscale_timeseries(
    basket="LATAM",
    basket_contents=["BOL", "MEX", "COL", "ARG"],
    dim="area (ISO3)",
)
<xarray.DataArray 'CO2' (time: 4, area (ISO3): 5)> Size: 160B
<Quantity([[0.66352018 0.91683183 0.10638745 0.61416341 2.30090286]
 [0.33822533 0.68233863 0.73953851 0.79057726 2.55067973]
 [0.21534665 0.43444215 0.47086107 0.50335724 1.62400712]
 [0.32475546 0.65516441 0.71008629 0.75909242 2.44909858]], 'CO2 * gigagram / year')>
Coordinates:
  * area (ISO3)  (area (ISO3)) object 40B 'COL' 'ARG' 'MEX' 'BOL' 'LATAM'
  * time         (time) datetime64[ns] 32B 2000-01-01 2001-01-01 ... 2003-01-01
    source       <U8 32B 'RAND2020'
Attributes:
    entity:   CO2

For the downscaling, shares for the countries at the points in time where data for all countries is available are determined, the shares are inter- and extrapolated where data is missing, and then the regional data is downscaled using these shares.