Downscaling#
To downscale a super-category (for example, regional data) to sub-categories
(for example, country-level data in the same region), the
xarray.DataArray.pr.downscale_timeseries()
function is available. It determines shares from available data points, then
does downscaling for years where full information is not available.
Let’s first create an example dataset with regional data and some country data missing.
Logging setup for the docs
# setup logging for the docs - we don't need debug logs
import sys
from loguru import logger
logger.remove()
logger.add(sys.stderr, level="INFO")
1
import primap2
import numpy as np
import xarray as xr
# select an example dataset
da = primap2.open_dataset("../minimal_ds.nc")["CO2"].loc[{"time": slice("2000", "2003"), "source": "RAND2020"}]
da.pr.to_df()
/home/docs/checkouts/readthedocs.org/user_builds/primap2/envs/stable/lib/python3.12/site-packages/xarray/core/variable.py:341: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
data = np.asarray(data)
area (ISO3) | COL | ARG | MEX | BOL |
---|---|---|---|---|
time | ||||
2000-01-01 | 0.663520 | 0.916832 | 0.106387 | 0.614163 |
2001-01-01 | 0.338225 | 0.682339 | 0.739539 | 0.790577 |
2002-01-01 | 0.114091 | 0.250728 | 0.850449 | 0.408739 |
2003-01-01 | 0.801477 | 0.852096 | 0.715213 | 0.080313 |
# compute regional data as sum of country-level data
temp = da.sum(dim="area (ISO3)")
temp = temp.expand_dims({"area (ISO3)": ["LATAM"]})
# delete data from the country level for the years 2002-2003 (inclusive)
da.loc[{"time": slice("2002", "2003")}].pint.magnitude[:] = np.nan
# add regional data to the array
da = xr.concat([da, temp], dim="area (ISO3)")
da.pr.to_df()
/home/docs/checkouts/readthedocs.org/user_builds/primap2/envs/stable/lib/python3.12/site-packages/xarray/core/variable.py:341: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
data = np.asarray(data)
area (ISO3) | COL | ARG | MEX | BOL | LATAM |
---|---|---|---|---|---|
time | |||||
2000-01-01 | 0.663520 | 0.916832 | 0.106387 | 0.614163 | 2.300903 |
2001-01-01 | 0.338225 | 0.682339 | 0.739539 | 0.790577 | 2.550680 |
2002-01-01 | NaN | NaN | NaN | NaN | 1.624007 |
2003-01-01 | NaN | NaN | NaN | NaN | 2.449099 |
As you can see, for 2000 and 2001, country-level data is available, but for later years, only regional (“LATAM”) data is available. We now want to extrapolate the missing data using the shares from early years and the regional data.
# Do the downscaling to fill in country-level data from regional data
da.pr.downscale_timeseries(
basket="LATAM",
basket_contents=["BOL", "MEX", "COL", "ARG"],
dim="area (ISO3)",
)
<xarray.DataArray 'CO2' (time: 4, area (ISO3): 5)> Size: 160B <Quantity([[0.66352018 0.91683183 0.10638745 0.61416341 2.30090286] [0.33822533 0.68233863 0.73953851 0.79057726 2.55067973] [0.21534665 0.43444215 0.47086107 0.50335724 1.62400712] [0.32475546 0.65516441 0.71008629 0.75909242 2.44909858]], 'CO2 * gigagram / year')> Coordinates: * area (ISO3) (area (ISO3)) object 40B 'COL' 'ARG' 'MEX' 'BOL' 'LATAM' * time (time) datetime64[ns] 32B 2000-01-01 2001-01-01 ... 2003-01-01 source <U8 32B 'RAND2020' Attributes: entity: CO2
For the downscaling, shares for the countries at the points in time where data for all countries is available are determined, the shares are inter- and extrapolated where data is missing, and then the regional data is downscaled using these shares.