Data reading example 3 - minimal test dataset (long)

Data reading example 3 - minimal test dataset (long)#

To run this example the file test_csv_data_long.csv must be placed in the same folder as this notebook. You can find the notebook and the csv file in the folder docs/source/data_reading in the PRIMAP2 repository.

# imports
import primap2 as pm2

Dataset Specifications#

Here we define which columns of the csv file contain the metadata. The dict coords_cols contains the mapping of csv columns to PRIMAP2 dimensions. Default values not found in the CSV are set using coords_defaults. The terminologies (e.g. IPCC2006 for categories or the ISO3 country codes for area) are set in the coords_terminologies dict. coords_value_mapping defines conversion of metadata values, e.g. category codes. You can either specify a dict for a metadata column which directly defines the mapping, a function which is used to map metadata values, or a string to select one of the pre-defined functions included in PRIMAP2. filter_keep and filter_remove filter the input data. Each entry in filter_keep specifies a subset of the input data which is kept while the subsets defined by filter_remove are removed from the input data.

In the example, the CSV contains the coordinates country, category, gas, and year. They are translated into their proper PRIMAP2 names by specifying the in the coords_cols dictionary. Additionally, columns are specified for the unit, and for the actual data (which is found in the column emissions in the CSV file). The format used in the year column is given using the time_format argument. Values for the scenario and source coordinate is not available in the csv file; therefore, we add them using default values defined in coords_defaults. Terminologies are given for area, category, scenario, and the secondary categories. Providing these terminologies is mandatory to create a valid PRIMAP2 dataset.

Coordinate mapping is necessary for category, entity, and unit. They all use the PRIMAP1 specifications in the csv file. For category this means that e.g. IPC1A2 would be converted to 1.A.2 for entity the conversion affects the way GWP information is stored in the entity name: e.g. KYOTOGHGAR4 is mapped to KYOTOGHG (AR4GWP100).

In this example, we also add meta_data to add a reference for the data and usage rights.

file = "test_csv_data_long.csv"
coords_cols = {
    "unit": "unit",
    "entity": "gas",
    "area": "country",
    "category": "category",
    "time": "year",
    "data": "emissions",
}
coords_defaults = {
    "source": "TESTcsv2021",
    "scenario": "HISTORY",
}
coords_terminologies = {
    "area": "ISO3",
    "category": "IPCC2006",
    "scenario": "general",
}
coords_value_mapping = {
    "category": "PRIMAP1",
    "entity": "PRIMAP1",
    "unit": "PRIMAP1",
}
meta_data = {
    "references": "Just ask around.",
    "rights": "public domain",
}
data_if = pm2.pm2io.read_long_csv_file_if(
    file,
    coords_cols=coords_cols,
    coords_defaults=coords_defaults,
    coords_terminologies=coords_terminologies,
    coords_value_mapping=coords_value_mapping,
    meta_data=meta_data,
    time_format="%Y",
)
data_if.head()
source scenario (general) area (ISO3) entity unit category (IPCC2006) 1991 2000 2010
0 TESTcsv2021 HISTORY AUS CO2 Gg CO2 / yr 1 4.1 5.0 6.0
1 TESTcsv2021 HISTORY ZAM CH4 Mt CH4 / yr 2 7.0 8.0 9.0
data_if.attrs
{'attrs': {'references': 'Just ask around.',
  'rights': 'public domain',
  'area': 'area (ISO3)',
  'cat': 'category (IPCC2006)',
  'scen': 'scenario (general)'},
 'time_format': '%Y',
 'dimensions': {'*': ['source',
   'scenario (general)',
   'area (ISO3)',
   'entity',
   'unit',
   'category (IPCC2006)']}}

Transformation to PRIMAP2 xarray format#

The transformation to PRIMAP2 xarray format is done using the function primap2.pm2io.from_interchange_format() which takes an interchange format DataFrame. The resulting xr Dataset is already quantified, thus the variables are pint arrays which include a unit.

data_pm2 = pm2.pm2io.from_interchange_format(data_if)
data_pm2
2024-10-07 16:18:49.338 | DEBUG    | primap2.pm2io._interchange_format:from_interchange_format:319 - Expected array shapes: [[1, 1, 2, 2, 2], [1, 1, 2, 2, 2]], resulting in size 16.
2024-10-07 16:18:49.362 | INFO     | primap2._data_format:ensure_valid_attributes:373 - Reference information is not a DOI: 'Just ask around.'
<xarray.Dataset> Size: 264B
Dimensions:              (time: 3, area (ISO3): 2, category (IPCC2006): 2,
                          scenario (general): 1, source: 1)
Coordinates:
  * area (ISO3)          (area (ISO3)) object 16B 'AUS' 'ZAM'
  * category (IPCC2006)  (category (IPCC2006)) object 16B '1' '2'
  * scenario (general)   (scenario (general)) object 8B 'HISTORY'
  * source               (source) object 8B 'TESTcsv2021'
  * time                 (time) datetime64[ns] 24B 1991-01-01 ... 2010-01-01
Data variables:
    CH4                  (time, area (ISO3), category (IPCC2006), scenario (general), source) float64 96B [CH4·Mt/yr] ...
    CO2                  (time, area (ISO3), category (IPCC2006), scenario (general), source) float64 96B [CO2·Gg/yr] ...
Attributes:
    references:  Just ask around.
    rights:      public domain
    area:        area (ISO3)
    cat:         category (IPCC2006)
    scen:        scenario (general)