Data reading example 2 - PRIMAP-hist v2.2

To run this example the file PRIMAPHIST22__19-Jan-2021.csv must be placed in the same folder as this notebook. The PRIMAP-hist data (doi:10.5281/zenodo.4479172) is available from Zenodo: https://zenodo.org/record/4479172

[1]:
# imports
import primap2 as pm2

Dataset Specifications

Here we define which columns of the csv file contain the coordinates. The dict coords_cols contains the mapping of csv columns to PRIMAP2 dimensions. Default values are set using coords_defaults. The terminologies (e.g. IPCC2006 for categories or the ISO3 country codes for area) are set in the coords_terminologies dict. coords_value_mapping defines conversion of metadata values, e.g. category codes. filter_keep and filter_remove filter the input data. Each entry in filter_keep specifies a subset of the input data which is kept while the subsets defined by filter_remove are removed from the input data.

For details, we refer to the documentation of read_wide_csv_file_if located in the pm2io module of PRIMAP2.

[2]:
file = "PRIMAPHIST22__19-Jan-2021.csv"
coords_cols = {
    "unit": "unit",
    "entity": "entity",
    "area": "country",
    "scenario": "scenario",
    "category": "category",
}
coords_defaults = {
    "source": "PRIMAP-hist_v2.2",
}
coords_terminologies = {
    "area": "ISO3",
    "category": "IPCC2006",
    "scenario": "PRIMAP-hist",
}

coords_value_mapping = {
    "category": "PRIMAP1",
    "unit": "PRIMAP1",
    "entity": "PRIMAP1",
}

filter_keep = {
    "f1": {
        "entity": "CO2",
        "category": ["IPC2", "IPC1"],
        "country": ["AUS", "BRA", "CHN", "GBR", "AFG"],
    },
    "f2": {
        "entity": "KYOTOGHG",
        "category": ["IPCMAG", "IPC4"],
        "country": ["AUS", "BRA", "CHN", "GBR", "AFG"],
    },
}

filter_remove = {"f1": {"scenario": "HISTTP"}}
# filter_keep = {"f1": {"entity": "KYOTOGHG", "category": ["IPC2", "IPC1"]},}
# filter_keep = {}
# filter_remove = {}

meta_data = {"references": "doi:10.5281/zenodo.4479172"}

Reading the data to interchange format

To enable a wider use of the PRIMAP2 data reading functionality we read into the PRIMAP2 interchange format, which is a wide format pandas DataFrame with coordinates in columns and following PRIMAP2 specifications. Additional metadata not captured in this format are stored in DataFrame.attrs as a dictionary. As the attrs functionality in pandas is experimental it is just stored in the DataFrame returned by the reading functions and should be stored individually before doing any processing with the DataFrame.

Here we read the data using the read_wide_csv_file_if() function. We have specified restrictive filters above to limit the data included in this notebook.

[3]:
PMH_if = pm2.pm2io.read_wide_csv_file_if(
    file,
    coords_cols=coords_cols,
    coords_defaults=coords_defaults,
    coords_terminologies=coords_terminologies,
    coords_value_mapping=coords_value_mapping,
    filter_keep=filter_keep,
    filter_remove=filter_remove,
    meta_data=meta_data,
)
PMH_if.head()
[3]:
source scenario (PRIMAP-hist) area (ISO3) entity unit category (IPCC2006) 1850 1851 1852 1853 ... 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
0 PRIMAP-hist_v2.2 HISTCR AFG CO2 Gg CO2 / yr 1 0.147 0.155 0.163 0.172 ... 6750.0 8440.0 12200.0 10700.0 9990.0 11000.0 11700.0 12700.0 13100.0 18600.0
1 PRIMAP-hist_v2.2 HISTCR AFG CO2 Gg CO2 / yr 2 0.169 0.178 0.188 0.198 ... 191.0 207.0 207.0 268.0 341.0 318.0 269.0 293.0 292.0 298.0
2 PRIMAP-hist_v2.2 HISTCR AFG KYOTOGHG (SARGWP100) Gg CO2 / yr 4 155.000 154.000 154.000 153.000 ... 3080.0 3160.0 3270.0 3400.0 3510.0 3620.0 3730.0 3800.0 3900.0 4010.0
3 PRIMAP-hist_v2.2 HISTCR AFG KYOTOGHG (SARGWP100) Gg CO2 / yr M.AG 615.000 668.000 719.000 770.000 ... 12400.0 14100.0 14300.0 14200.0 14100.0 14600.0 13600.0 13900.0 13800.0 13300.0
4 PRIMAP-hist_v2.2 HISTCR AUS CO2 Gg CO2 / yr 1 0.000 0.000 0.000 0.000 ... 384000.0 380000.0 378000.0 383000.0 376000.0 372000.0 380000.0 389000.0 393000.0 393000.0

5 rows × 175 columns

[4]:
PMH_if.attrs
[4]:
{'attrs': {'references': 'doi:10.5281/zenodo.4479172',
  'area': 'area (ISO3)',
  'scen': 'scenario (PRIMAP-hist)',
  'cat': 'category (IPCC2006)'},
 'time_format': '%Y',
 'dimensions': {'CO2': ['unit',
   'entity',
   'area (ISO3)',
   'category (IPCC2006)',
   'source',
   'scenario (PRIMAP-hist)'],
  'KYOTOGHG (SARGWP100)': ['unit',
   'entity',
   'area (ISO3)',
   'category (IPCC2006)',
   'source',
   'scenario (PRIMAP-hist)']}}

Transformation to PRIMAP2 xarray format

The transformation to PRIMAP2 xarray format is done using the function from_interchange_format which takes an interchange format DataFrame. The resulting xr Dataset is already quantified, thus the variables are pint arrays which include a unit.

[5]:
PMH_pm2 = pm2.pm2io.from_interchange_format(PMH_if)
PMH_pm2
2021-03-31 11:46:22.696 | DEBUG    | primap2.pm2io._interchange_format:from_interchange_format:252 - Expected array shapes: [[2, 5, 4, 1, 1], [2, 5, 4, 1, 1]], resulting in size 80.
[5]:
<xarray.Dataset>
Dimensions:                 (area (ISO3): 5, category (IPCC2006): 4, scenario (PRIMAP-hist): 1, source: 1, time: 169)
Coordinates:
  * category (IPCC2006)     (category (IPCC2006)) object '1' '2' '4' 'M.AG'
  * time                    (time) datetime64[ns] 1850-01-01 ... 2018-01-01
  * area (ISO3)             (area (ISO3)) object 'AFG' 'AUS' 'BRA' 'CHN' 'GBR'
  * source                  (source) object 'PRIMAP-hist_v2.2'
  * scenario (PRIMAP-hist)  (scenario (PRIMAP-hist)) object 'HISTCR'
Data variables:
    CO2                     (time, area (ISO3), category (IPCC2006), source, scenario (PRIMAP-hist)) float64 [CO2·Gg/annum] ...
    KYOTOGHG (SARGWP100)    (time, area (ISO3), category (IPCC2006), source, scenario (PRIMAP-hist)) float64 [CO2·Gg/annum] ...
Attributes:
    references:  doi:10.5281/zenodo.4479172
    area:        area (ISO3)
    scen:        scenario (PRIMAP-hist)
    cat:         category (IPCC2006)
[5]: