Data reading example 1 - minimal test dataset

To run this example the file test_csv_data_sec_cat.csv must be placed in the same folder as this notebook. You can find the notebook and the csv file in the folder docs/data_reading_examples in the PRIMAP2 repository.

[1]:
import primap2 as pm2

Dataset Specifications

Here we define which columns of the csv file contain the metadata. The dict coords_cols contains the mapping of csv columns to PRIMAP2 dimensions. Default values are set using coords_defaults. The terminologies (e.g. IPCC2006 for categories or the ISO3 country codes for area) are set in the coords_terminologies dict. coords_value_mapping defines conversion of metadata values, e.g. category codes. You can either specify a dict for a metadata column which directly defines the mapping, a function which is used to map metadata values, or a string to select one of the pre-defined functions included in PRIMAP2. filter_keep and filter_remove filter the input data. Each entry in filter_keep specifies a subset of the input data which is kept while the subsets defined by filter_remove are removed from the input data.

For details, we refer to the documentation of read_wide_csv_file_if located in the pm2io module of PRIMAP2.

In the example, the CSV contains the coordinates entity, area, category, and the secondary category class. As secondary categories have free names, they are prefixed with sec_cats__ to make clear that it is a secondary category. Values for the secondary category type, and the scenario coordinate is not available in the csv file; therefore, we add them using default values defined in coords_defaults. Terminologies are given for area, category, scenario, and the secondary categories. Providing these terminologies is mandatory to create a valid PRIMAP2 dataset.

Coordinate mapping is necessary for category, entity, and unit. They all use the PRIMAP1 specifications in the csv file. For category this means that e.g. IPC1A2 would be converted to 1.A.2 for entity the conversion affects the way GWP information is stored in the entity name: e.g. KYOTOGHGAR4 is mapped to KYOTOGHG (AR4GWP100).

In this example, we also add meta_data to add a reference for the data and usage rights.

For examples on using filters we refer to the second example which reads the PRIMAP-hist data.

[2]:
file = "test_csv_data_sec_cat.csv"
coords_cols = {
    "unit": "unit",
    "entity": "gas",
    "area": "country",
    "category": "category",
    "sec_cats__Class": "classification",
}
coords_defaults = {
    "source": "TESTcsv2021",
    "sec_cats__Type": "fugitive",
    "scenario": "HISTORY",
}
coords_terminologies = {
    "area": "ISO3",
    "category": "IPCC2006",
    "sec_cats__Type": "type",
    "sec_cats__Class": "class",
    "scenario": "general",
}
coords_value_mapping = {
    "category": "PRIMAP1",
    "entity": "PRIMAP1",
    "unit": "PRIMAP1",
}
meta_data = {
    "references": "Just ask around.",
    "rights": "public domain",
}
data_if = pm2.pm2io.read_wide_csv_file_if(
    file,
    coords_cols=coords_cols,
    coords_defaults=coords_defaults,
    coords_terminologies=coords_terminologies,
    coords_value_mapping=coords_value_mapping,
    meta_data=meta_data,
)
data_if.head()
[2]:
source scenario (general) area (ISO3) entity unit category (IPCC2006) Class (class) Type (type) 1991 2000 2010
0 TESTcsv2021 HISTORY AUS CO2 Gg CO2 / yr 1 TOTAL fugitive 4000.00 5000.00 6000.00
1 TESTcsv2021 HISTORY AUS KYOTOGHG (SARGWP100) Mt CO2 / yr 0 TOTAL fugitive 8.00 9.00 10.00
2 TESTcsv2021 HISTORY FRA CH4 Gg CH4 / yr 2 TOTAL fugitive 7.00 8.00 9.00
3 TESTcsv2021 HISTORY FRA CO2 Gg CO2 / yr 2 TOTAL fugitive 12.00 13.00 14.00
4 TESTcsv2021 HISTORY FRA KYOTOGHG (SARGWP100) Mt CO2 / yr 0 TOTAL fugitive 0.03 0.02 0.04
[3]:
data_if.attrs
[3]:
{'attrs': {'references': 'Just ask around.',
  'rights': 'public domain',
  'area': 'area (ISO3)',
  'cat': 'category (IPCC2006)',
  'scen': 'scenario (general)',
  'sec_cats': ['Class (class)', 'Type (type)']},
 'time_format': '%Y',
 'dimensions': {'*': ['source',
   'scenario (general)',
   'area (ISO3)',
   'entity',
   'unit',
   'category (IPCC2006)',
   'Class (class)',
   'Type (type)']}}

Transformation to PRIMAP2 xarray format

The transformation to PRIMAP2 xarray format is done using the function from_interchange_format which takes an interchange format DataFrame. The resulting xr Dataset is already quantified, thus the variables are pint arrays which include a unit.

[4]:
data_pm2 = pm2.pm2io.from_interchange_format(data_if)
data_pm2
2023-12-12 10:24:16.589 | DEBUG    | primap2.pm2io._interchange_format:from_interchange_format:320 - Expected array shapes: [[1, 1, 4, 3, 4, 1, 1], [1, 1, 4, 3, 4, 1, 1], [1, 1, 4, 3, 4, 1, 1]], resulting in size 144.
/home/docs/checkouts/readthedocs.org/user_builds/primap2/envs/stable/lib/python3.11/site-packages/xarray/core/utils.py:494: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
  warnings.warn(
2023-12-12 10:24:16.651 | INFO     | primap2._data_format:ensure_valid_attributes:292 - Reference information is not a DOI: 'Just ask around.'
[4]:
<xarray.Dataset>
Dimensions:               (time: 3, scenario (general): 1,
                           category (IPCC2006): 4, Class (class): 1,
                           area (ISO3): 4, Type (type): 1, source: 1)
Coordinates:
  * scenario (general)    (scenario (general)) object 'HISTORY'
  * category (IPCC2006)   (category (IPCC2006)) object '0' '1' '2' '3'
  * Class (class)         (Class (class)) object 'TOTAL'
  * area (ISO3)           (area (ISO3)) object 'AUS' 'FRA' 'USA' 'ZAM'
  * Type (type)           (Type (type)) object 'fugitive'
  * source                (source) object 'TESTcsv2021'
  * time                  (time) datetime64[ns] 1991-01-01 2000-01-01 2010-01-01
Data variables:
    CH4                   (time, scenario (general), category (IPCC2006), Class (class), area (ISO3), Type (type), source) float64 [CH4·Gg/annum] ...
    CO2                   (time, scenario (general), category (IPCC2006), Class (class), area (ISO3), Type (type), source) float64 [CO2·Gg/annum] ...
    KYOTOGHG (SARGWP100)  (time, scenario (general), category (IPCC2006), Class (class), area (ISO3), Type (type), source) float64 [CO2·Mt/annum] ...
Attributes:
    references:  Just ask around.
    rights:      public domain
    area:        area (ISO3)
    cat:         category (IPCC2006)
    scen:        scenario (general)
    sec_cats:    ['Class (class)', 'Type (type)']
[ ]: