{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "# Usage\n",
    "\n",
    "Because PRIMAP2 builds on xarray, all xarray functionality is available\n",
    "right away.\n",
    "Additional functionality is provided in the ``primap2`` package and\n",
    "in the ``pr`` namespace on xarray objects.\n",
    "In this section, we will present examples of PRIMAP2 usage.\n",
    "\n",
    "## Importing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "import primap2  # injects the \"pr\" namespace into xarray"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# set up logging for the docs - don't show debug messages\n",
    "import sys\n",
    "from loguru import logger\n",
    "\n",
    "logger.remove()\n",
    "logger.add(sys.stderr, level=\"INFO\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Loading Datafiles\n",
    "\n",
    "### Loading from netcdf files\n",
    "\n",
    "The native storage format of PRIMAP2 are netcdf5 files, and datasets\n",
    "can be written to and loaded from netcdf5 files using PRIMAP2 functions.\n",
    "We will load the \"minimal\" and \"opulent\" Datasets from the data format section:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "ds_min = primap2.open_dataset(\"minimal_ds.nc\")\n",
    "ds = primap2.open_dataset(\"opulent_ds.nc\")\n",
    "\n",
    "ds"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Accessing metadata\n",
    "\n",
    "Metadata is stored in the `attrs` of Datasets, and you can of course\n",
    "access it directly there.\n",
    "Additionally, you can access the PRIMAP2 metadata directly under\n",
    "the `.pr` namespace, which has the advantage that autocompletion\n",
    "works in ipython and IDEs and typos will be caught immediately."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "ds.pr.title"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "ds.pr.title = \"Another title\"\n",
    "ds.pr.title"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Selecting data\n",
    "\n",
    "Data can be selected using the\n",
    "[xarray indexing methods](https://xarray.pydata.org/en/stable/indexing.html),\n",
    "but PRIMAP2 also provides own versions of some of xarray's selection methods\n",
    "which work using the dimension names without the category set.\n",
    "\n",
    "### Getitem\n",
    "\n",
    "The following selections both select the same:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "ds[\"area (ISO3)\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "ds.pr[\"area\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### The loc Indexer\n",
    "\n",
    "Similarly, a version of the `loc` indexer is provided which works with the\n",
    "bare dimension names:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "ds.pr.loc[{\"time\": slice(\"2002\", \"2005\"), \"animal\": \"cow\"}]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "It also works on DataArrays:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "da = ds[\"CO2\"]\n",
    "\n",
    "da.pr.loc[\n",
    "    {\n",
    "        \"time\": slice(\"2002\", \"2005\"),\n",
    "        \"animal\": \"cow\",\n",
    "        \"category\": \"0\",\n",
    "        \"area\": \"COL\",\n",
    "    }\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Setting data\n",
    "\n",
    "PRIMAP2 provides a unified API to introduce new data values, fill missing information,\n",
    "and overwrite existing information:\n",
    "the [da.pr.set](https://primap2.readthedocs.io/en/main/generated/xarray.DataArray.pr.set.html)\n",
    "function and its sibling\n",
    "[ds.pr.set](https://primap2.readthedocs.io/en/main/generated/xarray.Dataset.pr.set.html).\n",
    "\n",
    "The basic signature of the `set` functions is `set(dimension, keys, values)`, and it\n",
    "returns the changed object without changing the original one.\n",
    "Use it like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "da = ds_min[\"CO2\"].loc[{\"time\": slice(\"2000\", \"2005\")}]\n",
    "da"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from primap2 import ureg\n",
    "\n",
    "modified = da.pr.set(\n",
    "    \"area\", \"CUB\", np.linspace(0, 20, 6) * ureg(\"Gg CO2 / year\")\n",
    ")\n",
    "modified"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "By default, existing non-NaN values are not overwritten:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "try:\n",
    "    da.pr.set(\"area\", \"COL\", np.linspace(0, 20, 6) * ureg(\"Gg CO2 / year\"))\n",
    "except ValueError as err:\n",
    "    print(err)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "You can overwrite existing values by specifying `existing=\"overwrite\"`\n",
    "to overwrite all values or `existing=\"fillna\"` to overwrite only NaNs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "da.pr.set(\n",
    "    \"area\",\n",
    "    \"COL\",\n",
    "    np.linspace(0, 20, 6) * ureg(\"Gg CO2 / year\"),\n",
    "    existing=\"overwrite\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "By default, the `set()` function extends the specified dimension automatically to\n",
    "accommodate new values if not all key values are in the specified dimension yet.\n",
    "You can change this by specifying `new=\"error\"`, which will raise a KeyError if any of\n",
    "the keys is not found:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "try:\n",
    "    da.pr.set(\n",
    "        \"area\",\n",
    "        [\"COL\", \"CUB\"],\n",
    "        np.linspace(0, 20, 6) * ureg(\"Gg CO2 / year\"),\n",
    "        existing=\"overwrite\",\n",
    "        new=\"error\",\n",
    "    )\n",
    "except KeyError as err:\n",
    "    print(err)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "In particular, the `set()` functions can also be used with xarray's arithmetic\n",
    "functions to derive values from existing data and store the result in the Dataset.\n",
    "As an example, we will derive better values for category 0 by adding all\n",
    "its subcategories and store the result.\n",
    "\n",
    "First, let's see the current data for a small subset of the data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "sel = {\n",
    "    \"area\": \"COL\",\n",
    "    \"category\": [\"0\", \"1\", \"2\", \"3\", \"4\", \"5\"],\n",
    "    \"animal\": \"cow\",\n",
    "    \"product\": \"milk\",\n",
    "    \"scenario\": \"highpop\",\n",
    "    \"source\": \"RAND2020\",\n",
    "}\n",
    "subset = ds.pr.loc[sel].squeeze()\n",
    "\n",
    "# TODO: currently, plotting with units still emits a warning\n",
    "import warnings\n",
    "\n",
    "with warnings.catch_warnings():\n",
    "    warnings.simplefilter(\"ignore\")\n",
    "    subset[\"CO2\"].plot.line(x=\"time\", hue=\"category (IPCC 2006)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "While it is hard to see any details in this plot, it is clearly visible\n",
    "that category 0 is not the sum of the other categories (which should not\n",
    "come as a surprise since we generated the data at random).\n",
    "\n",
    "We will now recompute category 0 for the entire dataset using set():"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "cat0_new = ds.pr.loc[{\"category\": [\"1\", \"2\", \"3\", \"4\", \"5\"]}].pr.sum(\n",
    "    \"category\"\n",
    ")\n",
    "\n",
    "ds = ds.pr.set(\n",
    "    \"category\",\n",
    "    \"0\",\n",
    "    cat0_new,\n",
    "    existing=\"overwrite\",\n",
    ")\n",
    "\n",
    "# plot a small subset of the result\n",
    "subset = ds.pr.loc[sel].squeeze()\n",
    "# TODO: currently, plotting with units still emits a warning\n",
    "import warnings\n",
    "\n",
    "with warnings.catch_warnings():\n",
    "    warnings.simplefilter(\"ignore\")\n",
    "    subset[\"CO2\"].plot.line(x=\"time\", hue=\"category (IPCC 2006)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "As you can see in the plot, category 0 is now computed from its subcategories.\n",
    "The set() method of Datasets works on all data variables in the dataset which\n",
    "have the corresponding dimension. In this example, the \"population\" variable\n",
    "does not have categories, so it was unchanged."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Unit handling\n",
    "\n",
    "PRIMAP2 uses the [openscm_units](https://openscm-units.readthedocs.io)\n",
    "package based on the [Pint](https://pint.readthedocs.io/) library\n",
    "for handling of units.\n",
    "\n",
    "### CO2 equivalent units and mass units\n",
    "\n",
    "Using global warming potential contexts, it is easy to convert mass units\n",
    "into CO2 equivalents:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "from primap2 import ureg  # The unit registry\n",
    "\n",
    "sf6_gwp = ds[\"SF6\"].pr.convert_to_gwp(\n",
    "    gwp_context=\"AR4GWP100\", units=\"Gg CO2 / year\"\n",
    ")\n",
    "# The information about the used GWP context is retained:\n",
    "sf6_gwp.attrs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "Because the GWP context used for conversion is stored, it is equally easy\n",
    "to convert back to mass units:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "sf6 = sf6_gwp.pr.convert_to_mass()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "The stored GWP context can also be used to convert another array using the\n",
    "same context:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "ch4_gwp = ds[\"CH4\"].pr.convert_to_gwp_like(sf6_gwp)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Dropping units\n",
    "\n",
    "Sometimes, it is necessary to drop the units, for example to use\n",
    "arrays as input for external functions which are unit-naive.\n",
    "This can be done safely by first converting to the target unit, then\n",
    "dequantifying the dataset or array:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "ds[\"CH4\"].pint.to(\"Mt CH4 / year\").pr.dequantify()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "Note that the units are then stored in the DataArray's `attrs`, and can be\n",
    "restored using the\n",
    "[da.pr.quantify](https://primap2.readthedocs.io/en/main/generated/xarray.DataArray.pr.quantify.html)\n",
    "function.\n",
    "\n",
    "## Descriptive statistics\n",
    "\n",
    "To get an overview about the missing information in a Dataset or DataArray, you\n",
    "can use the `pr.coverage` function. It gives you a summary\n",
    "of the number of non-NaN data points:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import xarray as xr\n",
    "import pandas as pd\n",
    "\n",
    "time = pd.date_range(\"2000-01-01\", \"2003-01-01\", freq=\"AS\")\n",
    "area_iso3 = np.array([\"COL\", \"ARG\", \"MEX\"])\n",
    "category_ipcc = np.array([\"1\", \"2\"])\n",
    "coords = [\n",
    "    (\"category (IPCC2006)\", category_ipcc),\n",
    "    (\"area (ISO3)\", area_iso3),\n",
    "    (\"time\", time),\n",
    "]\n",
    "da = xr.DataArray(\n",
    "    data=[\n",
    "        [\n",
    "            [1, 2, 3, 4],\n",
    "            [np.nan, np.nan, np.nan, np.nan],\n",
    "            [1, 2, 3, np.nan],\n",
    "        ],\n",
    "        [\n",
    "            [np.nan, 2, np.nan, 4],\n",
    "            [1, np.nan, 3, np.nan],\n",
    "            [1, np.nan, 3, np.nan],\n",
    "        ],\n",
    "    ],\n",
    "    coords=coords,\n",
    ")\n",
    "\n",
    "da"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "da.pr.coverage(\"area\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "da.pr.coverage(\"time\", \"area\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "For Datasets, you can also specify the \"entity\" as a coordinate to summarize for the\n",
    "data variables:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "import primap2.tests\n",
    "\n",
    "ds = primap2.tests.examples.opulent_ds()\n",
    "ds[\"CO2\"].pr.loc[{\"product\": \"milk\"}].pint.magnitude[:] = np.nan\n",
    "\n",
    "ds.pr.coverage(\"product\", \"entity\", \"area\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Merging\n",
    "\n",
    "xarray provides different functions to combine Datasets and DataArrays.\n",
    "However, these are not built to combine data which contain duplicates\n",
    "with rounding / processing errors. However, when reading data e.g. from country\n",
    "reports this is often needed as some sectors are included in several tables\n",
    "and might use different numbers of decimals. Thus, PRIMAP2 has added a merge\n",
    "function that can accept data discrepancies not exceeding a given tolerance\n",
    "level. The merging of attributes is handled by xarray and the `combine_attrs`\n",
    "parameter is just passed on to the xarray functions. Default is to `drop_conflicts`.\n",
    "\n",
    "Below an example using the built in `opulent_ds`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "from primap2.tests.examples import opulent_ds\n",
    "import xarray as xr\n",
    "\n",
    "op_ds = opulent_ds()\n",
    "# only take part of the countries to have something to actually merge\n",
    "da_start = op_ds[\"CO2\"].pr.loc[{\"area\": [\"ARG\", \"COL\", \"MEX\"]}]\n",
    "# modify some data\n",
    "data_to_modify = op_ds[\"CO2\"].pr.loc[{\"area\": [\"ARG\"]}].pr.sum(\"area\")\n",
    "data_to_modify.data = data_to_modify.data * 1.009\n",
    "da_merge = op_ds[\"CO2\"].pr.set(\n",
    "    \"area\", \"ARG\", data_to_modify, existing=\"overwrite\"\n",
    ")\n",
    "\n",
    "# merge with tolerance such that it will pass\n",
    "da_result = da_start.pr.merge(da_merge, tolerance=0.01)\n",
    "da_result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# merge with lower tolerance such that it will fail\n",
    "try:\n",
    "    da_result = da_start.pr.merge(da_merge, tolerance=0.005)\n",
    "except xr.MergeError as err:\n",
    "    print(f\"An error occured during merging: {err}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# only throw a warning and not an error\n",
    "da_result = da_start.pr.merge(\n",
    "    da_merge, tolerance=0.005, error_on_discrepancy=False\n",
    ")\n",
    "da_result"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Aggregation and infilling\n",
    "\n",
    "xarray provides robust functions for aggregation (`sum`) and filling of\n",
    "missing information (`fillna`).\n",
    "PRIMAP2 adds functions which fill or skip missing information based on if the\n",
    "information is missing at all points along certain axes, for example for\n",
    "a whole time series.\n",
    "This makes it possible to, for example, evaluate the sum of sub-categories\n",
    "while ignoring only those categories which are missing completely.\n",
    "It is also possible to ignore NA values (i.e. treating them as 0) in sums using \n",
    "the `skipna` parameter. \n",
    "When using `skipna`, the `min_count` parameter governs how many non-NA vales are \n",
    "needed in a sum for the result to be non-NA. The default value is `skipna=1`. \n",
    "This is helpful if you want to e.g. sum all subsectors and for some countries \n",
    "or gases some of the subsectors have NA values because there is no data. To avoid\n",
    "NA timeseries if a single sector is NA you use `skipna`. In other cases, e.g. when \n",
    "checking if data coverage is complete `skipna` is not used, so any NA value in the \n",
    "source data results in NA in the summed data and is not hidden. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "time = pd.date_range(\"2000-01-01\", \"2003-01-01\", freq=\"AS\")\n",
    "area_iso3 = np.array([\"COL\", \"ARG\", \"MEX\"])\n",
    "coords = [(\"area (ISO3)\", area_iso3), (\"time\", time)]\n",
    "da = xr.DataArray(\n",
    "    data=[\n",
    "        [1, 2, 3, 4],\n",
    "        [np.nan, np.nan, np.nan, np.nan],\n",
    "        [1, 2, 3, np.nan],\n",
    "    ],\n",
    "    coords=coords,\n",
    ")\n",
    "\n",
    "da"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "da.pr.sum(dim=\"area\", skipna_evaluation_dims=\"time\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "da.pr.sum(dim=\"area\", skipna=True)"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# compare this to the result of the standard xarray sum:\n",
    "\n",
    "da.sum(dim=\"area (ISO3)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "The same functionality is available for filling in missing information:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "da.pr.fill_all_na(\"time\", value=10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Downscaling\n",
    "\n",
    "To downscale a super-category (for example, regional data) to sub-categories\n",
    "(for example, country-level data in the same region), the `downscale_timeseries`\n",
    "function is available:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# select an example dataset\n",
    "da = primap2.open_dataset(\"minimal_ds.nc\")[\"CO2\"].loc[\n",
    "    {\"time\": slice(\"2000\", \"2010\")}\n",
    "]\n",
    "da"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# compute regional data as sum of country-level data\n",
    "temp = da.sum(dim=\"area (ISO3)\")\n",
    "temp = temp.expand_dims({\"area (ISO3)\": [\"LATAM\"]})\n",
    "# delete data from the country level for the years 2005-2009 (inclusive)\n",
    "da.loc[{\"time\": slice(\"2005\", \"2009\")}].pint.magnitude[:] = np.nan\n",
    "# add regional data to the array\n",
    "da = xr.concat([da, temp], dim=\"area (ISO3)\")\n",
    "da"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# Do the downscaling\n",
    "da.pr.downscale_timeseries(\n",
    "    basket=\"LATAM\",\n",
    "    basket_contents=[\"BOL\", \"MEX\", \"COL\", \"ARG\"],\n",
    "    dim=\"area (ISO3)\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "For the downscaling, shares for the sub-categories at the points in time\n",
    "where data for all sub-categories\n",
    "is available are determined, the shares are interpolated where data is missing,\n",
    "and then the super-category is downscaled using these shares."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Handling of gas baskets\n",
    "\n",
    "### Summation\n",
    "\n",
    "To sum the contents of gas baskets like KYOTOGHG, the function\n",
    "[ds.gas_basket_contents_sum](https://primap2.readthedocs.io/en/main/generated/xarray.Dataset.pr.gas_basket_contents_sum.html)\n",
    "is available:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# select example dataset\n",
    "ds = primap2.open_dataset(\"minimal_ds.nc\").loc[\n",
    "    {\"time\": slice(\"2000\", \"2010\")}\n",
    "]\n",
    "ds"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# add (empty) gas basket with corresponding metadata\n",
    "ds[\"KYOTOGHG\"] = xr.full_like(ds[\"CO2\"], np.nan).pr.quantify(\n",
    "    units=\"Gg CO2 / year\"\n",
    ")\n",
    "ds[\"KYOTOGHG\"].attrs = {\"entity\": \"KYOTOGHG\", \"gwp_context\": \"AR4GWP100\"}\n",
    "\n",
    "ds"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# compute gas basket from its contents, which have to be given explicitly\n",
    "ds.pr.gas_basket_contents_sum(\n",
    "    basket=\"KYOTOGHG\",\n",
    "    basket_contents=[\"CO2\", \"SF6\", \"CH4\"],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "Note that like all PRIMAP2 functions,\n",
    "[gas_basket_contents_sum](https://primap2.readthedocs.io/en/main/generated/xarray.Dataset.pr.gas_basket_contents_sum.html)\n",
    "returns the result without overwriting anything in the original dataset,\n",
    "so you have to explicitly overwrite existing data if you want that:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "ds[\"KYOTOGHG\"] = ds.pr.gas_basket_contents_sum(\n",
    "    basket=\"KYOTOGHG\",\n",
    "    basket_contents=[\"CO2\", \"SF6\", \"CH4\"],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Filling in missing information\n",
    "\n",
    "To fill in missing data in a gas basket, use\n",
    "[fill_na_gas_basket_from_contents](https://primap2.readthedocs.io/en/main/generated/xarray.Dataset.pr.fill_na_gas_basket_from_contents.html):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# delete all data about the years 2005-2009 (inclusive) from the\n",
    "# KYOTOGHG data\n",
    "ds[\"KYOTOGHG\"].loc[{\"time\": slice(\"2005\", \"2009\")}].pint.magnitude[\n",
    "    :\n",
    "] = np.nan\n",
    "ds[\"KYOTOGHG\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "ds.pr.fill_na_gas_basket_from_contents(\n",
    "    basket=\"KYOTOGHG\", basket_contents=[\"CO2\", \"SF6\", \"CH4\"]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "The reverse case is that you are missing some data in the timeseries of\n",
    "individual gases and want to fill those in using downscaled data from\n",
    "a gas basket.\n",
    "Here, use\n",
    "[downscale_gas_timeseries](https://primap2.readthedocs.io/en/main/generated/xarray.Dataset.pr.downscale_gas_timeseries.html):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# delete all data about the years 2005-2009 from the individual gas data\n",
    "sel = {\"time\": slice(\"2005\", \"2009\")}\n",
    "ds[\"CO2\"].loc[sel].pint.magnitude[:] = np.nan\n",
    "ds[\"SF6\"].loc[sel].pint.magnitude[:] = np.nan\n",
    "ds[\"CH4\"].loc[sel].pint.magnitude[:] = np.nan\n",
    "ds"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# This determines gas shares at the points in time where individual gas\n",
    "# data is available, interpolates the shares where data is missing, and\n",
    "# then downscales the gas basket data using the interpolated shares\n",
    "ds.pr.downscale_gas_timeseries(\n",
    "    basket=\"KYOTOGHG\", basket_contents=[\"CO2\", \"SF6\", \"CH4\"]\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}