{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Data reading example 3 - minimal test dataset (long) #\n", "To run this example the file `test_csv_data_long.csv` must be placed in the same folder as this notebook. You can find the notebook and the csv file in the folder `docs/data_reading_examples` in the PRIMAP2 repository." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# imports\n", "import primap2 as pm2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dataset Specifications ##\n", "Here we define which columns of the csv file contain the metadata.\n", "The dict `coords_cols` contains the mapping of csv columns to PRIMAP2 dimensions.\n", "Default values not found in the CSV are set using `coords_defaults`.\n", "The terminologies (e.g. IPCC2006 for categories or the ISO3 country codes for area) are set in the `coords_terminologies` dict.\n", "`coords_value_mapping` defines conversion of metadata values, e.g. category codes.\n", "You can either specify a dict for a metadata column which directly defines the mapping, a function which is used to map metadata values, or a string to select one of the pre-defined functions included in PRIMAP2.\n", "`filter_keep` and `filter_remove` filter the input data.\n", "Each entry in `filter_keep` specifies a subset of the input data which is kept while the subsets defined by `filter_remove` are removed from the input data.\n", "\n", "In the example, the CSV contains the coordinates `country`, `category`, `gas`, and `year`.\n", "They are translated into their proper PRIMAP2 names by specifying the in the\n", "`coords_cols` dictionary. Additionally, columns are specified for the `unit`, and\n", "for the actual `data` (which is found in the column `emissions` in the CSV file).\n", "The format used in the `year` column is given using the `time_format` argument.\n", "Values for the `scenario` and `source` coordinate is not available in the csv file;\n", " therefore, we add them using default values defined in `coords_defaults`.\n", "Terminologies are given for `area`, `category`, `scenario`, and the secondary categories.\n", "Providing these terminologies is mandatory to create a valid PRIMAP2 dataset.\n", "\n", "Coordinate mapping is necessary for `category`, `entity`, and `unit`.\n", "They all use the PRIMAP1 specifications in the csv file.\n", "For `category` this means that e.g. `IPC1A2` would be converted to `1.A.2` for `entity` the conversion affects the way GWP information is stored in the entity name: e.g. `KYOTOGHGAR4` is mapped to `KYOTOGHG (AR4GWP100)`.\n", "\n", "In this example, we also add `meta_data` to add a reference for the data and usage rights." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "file = \"test_csv_data_long.csv\"\n", "coords_cols = {\n", " \"unit\": \"unit\",\n", " \"entity\": \"gas\",\n", " \"area\": \"country\",\n", " \"category\": \"category\",\n", " \"time\": \"year\",\n", " \"data\": \"emissions\",\n", "}\n", "coords_defaults = {\n", " \"source\": \"TESTcsv2021\",\n", " \"scenario\": \"HISTORY\",\n", "}\n", "coords_terminologies = {\n", " \"area\": \"ISO3\",\n", " \"category\": \"IPCC2006\",\n", " \"scenario\": \"general\",\n", "}\n", "coords_value_mapping = {\n", " \"category\": \"PRIMAP1\",\n", " \"entity\": \"PRIMAP1\",\n", " \"unit\": \"PRIMAP1\",\n", "}\n", "meta_data = {\n", " \"references\": \"Just ask around.\",\n", " \"rights\": \"public domain\",\n", "}\n", "data_if = pm2.pm2io.read_long_csv_file_if(\n", " file,\n", " coords_cols=coords_cols,\n", " coords_defaults=coords_defaults,\n", " coords_terminologies=coords_terminologies,\n", " coords_value_mapping=coords_value_mapping,\n", " meta_data=meta_data,\n", " time_format=\"%Y\",\n", ")\n", "data_if.head()" ] }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "data_if.attrs" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Transformation to PRIMAP2 xarray format ##\n", "The transformation to PRIMAP2 xarray format is done using the function `from_interchange_format` which takes an interchange format DataFrame.\n", "The resulting xr Dataset is already quantified, thus the variables are pint arrays which include a unit." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_pm2 = pm2.pm2io.from_interchange_format(data_if)\n", "data_pm2" ] }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }