Data reading example 2 - PRIMAP-hist v2.2#

In this example, we read an old version of PRIMAP-hist which is not available in the native format because it was produced before the native format was invented.

# imports
import primap2 as pm2

Obtain the input data#

The PRIMAP-hist data (doi:10.5281/zenodo.4479172) is available from Zenodo, we download it directly.

import requests
response = requests.get("https://zenodo.org/records/4479172/files/PRIMAP-hist_v2.2_19-Jan-2021.csv?download=1")
file = "PRIMAPHIST22__19-Jan-2021.csv"
with open(file, "w") as fd:
    fd.write(response.text)

Dataset Specifications#

Here we define which columns of the csv file contain the coordinates. The dict coords_cols contains the mapping of csv columns to PRIMAP2 dimensions. Default values are set using coords_defaults. The terminologies (e.g. IPCC2006 for categories or the ISO3 country codes for area) are set in the coords_terminologies dict. coords_value_mapping defines conversion of metadata values, e.g. category codes. filter_keep and filter_remove filter the input data. Each entry in filter_keep specifies a subset of the input data which is kept while the subsets defined by filter_remove are removed from the input data.

For details, we refer to the documentation of primap2.pm2io.read_wide_csv_file_if().

coords_cols = {
    "unit": "unit",
    "entity": "entity",
    "area": "country",
    "scenario": "scenario",
    "category": "category",
}
coords_defaults = {
    "source": "PRIMAP-hist_v2.2",
}
coords_terminologies = {
    "area": "ISO3",
    "category": "IPCC2006",
    "scenario": "PRIMAP-hist",
}

coords_value_mapping = {
    "category": "PRIMAP1",
    "unit": "PRIMAP1",
    "entity": "PRIMAP1",
}

filter_keep = {
    "f1": {
        "entity": "CO2",
        "category": ["IPC2", "IPC1"],
        "country": ["AUS", "BRA", "CHN", "GBR", "AFG"],
    },
    "f2": {
        "entity": "KYOTOGHG",
        "category": ["IPCMAG", "IPC4"],
        "country": ["AUS", "BRA", "CHN", "GBR", "AFG"],
    },
}

filter_remove = {"f1": {"scenario": "HISTTP"}}
# filter_keep = {"f1": {"entity": "KYOTOGHG", "category": ["IPC2", "IPC1"]},}
# filter_keep = {}
# filter_remove = {}

meta_data = {"references": "doi:10.5281/zenodo.4479172"}

Reading the data to interchange format#

To enable a wider use of the PRIMAP2 data reading functionality we read into the PRIMAP2 interchange format, which is a wide format pandas DataFrame with coordinates in columns and following PRIMAP2 specifications. Additional metadata not captured in this format are stored in DataFrame.attrs as a dictionary. As the attrs functionality in pandas is experimental it is just stored in the DataFrame returned by the reading functions and should be stored individually before doing any processing with the DataFrame.

Here we read the data using the primap2.pm2io.read_wide_csv_file_if() function. We have specified restrictive filters above to limit the data included in this notebook.

PMH_if = pm2.pm2io.read_wide_csv_file_if(
    file,
    coords_cols=coords_cols,
    coords_defaults=coords_defaults,
    coords_terminologies=coords_terminologies,
    coords_value_mapping=coords_value_mapping,
    filter_keep=filter_keep,
    filter_remove=filter_remove,
    meta_data=meta_data,
)
PMH_if.head()
/home/docs/checkouts/readthedocs.org/user_builds/primap2/checkouts/stable/primap2/pm2io/_data_reading.py:1156: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  data[coord] = coords_defaults[coord]
source scenario (PRIMAP-hist) area (ISO3) entity unit category (IPCC2006) 1850 1851 1852 1853 ... 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
0 PRIMAP-hist_v2.2 HISTCR AFG CO2 Gg CO2 / yr 1 0.147 0.155 0.163 0.172 ... 6750.0 8440.0 12200.0 10700.0 9990.0 11000.0 11700.0 12700.0 13100.0 18600.0
1 PRIMAP-hist_v2.2 HISTCR AFG CO2 Gg CO2 / yr 2 0.169 0.178 0.188 0.198 ... 191.0 207.0 207.0 268.0 341.0 318.0 269.0 293.0 292.0 298.0
2 PRIMAP-hist_v2.2 HISTCR AFG KYOTOGHG (SARGWP100) Gg CO2 / yr 4 155.000 154.000 154.000 153.000 ... 3080.0 3160.0 3270.0 3400.0 3510.0 3620.0 3730.0 3800.0 3900.0 4010.0
3 PRIMAP-hist_v2.2 HISTCR AFG KYOTOGHG (SARGWP100) Gg CO2 / yr M.AG 615.000 668.000 719.000 770.000 ... 12400.0 14100.0 14300.0 14200.0 14100.0 14600.0 13600.0 13900.0 13800.0 13300.0
4 PRIMAP-hist_v2.2 HISTCR AUS CO2 Gg CO2 / yr 1 0.000 0.000 0.000 0.000 ... 384000.0 380000.0 378000.0 383000.0 376000.0 372000.0 380000.0 389000.0 393000.0 393000.0

5 rows × 175 columns

PMH_if.attrs
{'attrs': {'references': 'doi:10.5281/zenodo.4479172',
  'area': 'area (ISO3)',
  'scen': 'scenario (PRIMAP-hist)',
  'cat': 'category (IPCC2006)'},
 'time_format': '%Y',
 'dimensions': {'*': ['source',
   'scenario (PRIMAP-hist)',
   'area (ISO3)',
   'entity',
   'unit',
   'category (IPCC2006)']}}

Transformation to PRIMAP2 xarray format#

The transformation to PRIMAP2 xarray format is done using the function primap2.pm2io.from_interchange_format() which takes an interchange format DataFrame. The resulting xr Dataset is already quantified, thus the variables are pint arrays which include a unit.

PMH_pm2 = pm2.pm2io.from_interchange_format(PMH_if)
PMH_pm2
2024-10-07 16:18:45.691 | DEBUG    | primap2.pm2io._interchange_format:from_interchange_format:319 - Expected array shapes: [[1, 1, 5, 2, 4], [1, 1, 5, 2, 4]], resulting in size 80.
<xarray.Dataset> Size: 56kB
Dimensions:                 (time: 169, category (IPCC2006): 4, source: 1,
                             area (ISO3): 5, scenario (PRIMAP-hist): 1)
Coordinates:
  * category (IPCC2006)     (category (IPCC2006)) object 32B '1' '2' '4' 'M.AG'
  * source                  (source) object 8B 'PRIMAP-hist_v2.2'
  * area (ISO3)             (area (ISO3)) object 40B 'AFG' 'AUS' ... 'CHN' 'GBR'
  * scenario (PRIMAP-hist)  (scenario (PRIMAP-hist)) object 8B 'HISTCR'
  * time                    (time) datetime64[ns] 1kB 1850-01-01 ... 2018-01-01
Data variables:
    CO2                     (time, category (IPCC2006), source, area (ISO3), scenario (PRIMAP-hist)) float64 27kB [CO2·Gg/yr] ...
    KYOTOGHG (SARGWP100)    (time, category (IPCC2006), source, area (ISO3), scenario (PRIMAP-hist)) float64 27kB [CO2·Gg/yr] ...
Attributes:
    references:  doi:10.5281/zenodo.4479172
    area:        area (ISO3)
    scen:        scenario (PRIMAP-hist)
    cat:         category (IPCC2006)