{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Data reading example 3 - minimal test dataset (long) #\n",
    "To run this example the file `test_csv_data_long.csv` must be placed in the same folder as this notebook. You can find the notebook and the csv file in the folder `docs/data_reading_examples` in the PRIMAP2 repository."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# imports\n",
    "import primap2 as pm2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Dataset Specifications ##\n",
    "Here we define which columns of the csv file contain the metadata.\n",
    "The dict `coords_cols` contains the mapping of csv columns to PRIMAP2 dimensions.\n",
    "Default values not found in the CSV are set using `coords_defaults`.\n",
    "The terminologies (e.g. IPCC2006 for categories or the ISO3 country codes for area) are set in the `coords_terminologies` dict.\n",
    "`coords_value_mapping` defines conversion of metadata values, e.g. category codes.\n",
    "You can either specify a dict for a metadata column which directly defines the mapping, a function which is used to map metadata values, or a string to select one of the pre-defined functions included in PRIMAP2.\n",
    "`filter_keep` and `filter_remove` filter the input data.\n",
    "Each entry in `filter_keep` specifies a subset of the input data which is kept while the subsets defined by `filter_remove` are removed from the input data.\n",
    "\n",
    "In the example, the CSV contains the coordinates `country`, `category`, `gas`, and `year`.\n",
    "They are translated into their proper PRIMAP2 names by specifying the in the\n",
    "`coords_cols` dictionary. Additionally, columns are specified for the `unit`, and\n",
    "for the actual `data` (which is found in the column `emissions` in the CSV file).\n",
    "The format used in the `year` column is given using the `time_format` argument.\n",
    "Values for the `scenario` and `source` coordinate is not available in the csv file;\n",
    " therefore, we add them using default values defined in `coords_defaults`.\n",
    "Terminologies are given for `area`, `category`, `scenario`, and the secondary categories.\n",
    "Providing these terminologies is mandatory to create a valid PRIMAP2 dataset.\n",
    "\n",
    "Coordinate mapping is necessary for `category`, `entity`, and `unit`.\n",
    "They all use the PRIMAP1 specifications in the csv file.\n",
    "For `category` this means that e.g. `IPC1A2` would be converted to `1.A.2` for `entity` the conversion affects the way GWP information is stored in the entity name: e.g. `KYOTOGHGAR4` is mapped to `KYOTOGHG (AR4GWP100)`.\n",
    "\n",
    "In this example, we also add `meta_data` to add a reference for the data and usage rights."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "file = \"test_csv_data_long.csv\"\n",
    "coords_cols = {\n",
    "    \"unit\": \"unit\",\n",
    "    \"entity\": \"gas\",\n",
    "    \"area\": \"country\",\n",
    "    \"category\": \"category\",\n",
    "    \"time\": \"year\",\n",
    "    \"data\": \"emissions\",\n",
    "}\n",
    "coords_defaults = {\n",
    "    \"source\": \"TESTcsv2021\",\n",
    "    \"scenario\": \"HISTORY\",\n",
    "}\n",
    "coords_terminologies = {\n",
    "    \"area\": \"ISO3\",\n",
    "    \"category\": \"IPCC2006\",\n",
    "    \"scenario\": \"general\",\n",
    "}\n",
    "coords_value_mapping = {\n",
    "    \"category\": \"PRIMAP1\",\n",
    "    \"entity\": \"PRIMAP1\",\n",
    "    \"unit\": \"PRIMAP1\",\n",
    "}\n",
    "meta_data = {\n",
    "    \"references\": \"Just ask around.\",\n",
    "    \"rights\": \"public domain\",\n",
    "}\n",
    "data_if = pm2.pm2io.read_long_csv_file_if(\n",
    "    file,\n",
    "    coords_cols=coords_cols,\n",
    "    coords_defaults=coords_defaults,\n",
    "    coords_terminologies=coords_terminologies,\n",
    "    coords_value_mapping=coords_value_mapping,\n",
    "    meta_data=meta_data,\n",
    "    time_format=\"%Y\",\n",
    ")\n",
    "data_if.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "data_if.attrs"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Transformation to PRIMAP2 xarray format ##\n",
    "The transformation to PRIMAP2 xarray format is done using the function `from_interchange_format` which takes an interchange format DataFrame.\n",
    "The resulting xr Dataset is already quantified, thus the variables are pint arrays which include a unit."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_pm2 = pm2.pm2io.from_interchange_format(data_if)\n",
    "data_pm2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}