Data reading example 1 - minimal test dataset#
To run this example the file test_csv_data_sec_cat.csv
must be placed in the same folder as this notebook. You can find the notebook and the csv file in the folder docs/data_reading_examples
in the PRIMAP2 repository.
import primap2 as pm2
Dataset Specifications#
Here we define which columns of the csv file contain the metadata. The dict coords_cols
contains the mapping of csv columns to PRIMAP2 dimensions.
Default values are set using coords_defaults
.
The terminologies (e.g. IPCC2006 for categories or the ISO3 country codes for area) are set in the coords_terminologies
dict.
coords_value_mapping
defines conversion of metadata values, e.g. category codes.
You can either specify a dict for a metadata column which directly defines the mapping, a function which is used to map metadata values, or a string to select one of the pre-defined functions included in PRIMAP2.
filter_keep
and filter_remove
filter the input data.
Each entry in filter_keep
specifies a subset of the input data which is kept while the subsets defined by filter_remove
are removed from the input data.
For details, we refer to the documentation of primap2.pm2io.read_wide_csv_file_if()
.
In the example, the CSV contains the coordinates entity
, area
, category
, and the secondary category class
.
As secondary categories have free names, they are prefixed with sec_cats__
to make clear that it is a secondary category.
Values for the secondary category type
, and the scenario
coordinate is not available in the csv file;
therefore, we add them using default values defined in coords_defaults
.
Terminologies are given for area
, category
, scenario
, and the secondary categories.
Providing these terminologies is mandatory to create a valid PRIMAP2 dataset.
Coordinate mapping is necessary for category
, entity
, and unit
.
They all use the PRIMAP1 specifications in the csv file.
For category
this means that e.g. IPC1A2
would be converted to 1.A.2
for entity
the conversion affects the way GWP information is stored in the entity name: e.g. KYOTOGHGAR4
is mapped to KYOTOGHG (AR4GWP100)
.
In this example, we also add meta_data
to add a reference for the data and usage rights.
For examples on using filters we refer to the second example which reads the PRIMAP-hist data.
file = "test_csv_data_sec_cat.csv"
coords_cols = {
"unit": "unit",
"entity": "gas",
"area": "country",
"category": "category",
"sec_cats__Class": "classification",
}
coords_defaults = {
"source": "TESTcsv2021",
"sec_cats__Type": "fugitive",
"scenario": "HISTORY",
}
coords_terminologies = {
"area": "ISO3",
"category": "IPCC2006",
"sec_cats__Type": "type",
"sec_cats__Class": "class",
"scenario": "general",
}
coords_value_mapping = {
"category": "PRIMAP1",
"entity": "PRIMAP1",
"unit": "PRIMAP1",
}
meta_data = {
"references": "Just ask around.",
"rights": "public domain",
}
data_if = pm2.pm2io.read_wide_csv_file_if(
file,
coords_cols=coords_cols,
coords_defaults=coords_defaults,
coords_terminologies=coords_terminologies,
coords_value_mapping=coords_value_mapping,
meta_data=meta_data,
)
data_if.head()
source | scenario (general) | area (ISO3) | entity | unit | category (IPCC2006) | Class (class) | Type (type) | 1991 | 2000 | 2010 | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | TESTcsv2021 | HISTORY | AUS | CO2 | Gg CO2 / yr | 1 | TOTAL | fugitive | 4000.00 | 5000.00 | 6000.00 |
1 | TESTcsv2021 | HISTORY | AUS | KYOTOGHG (SARGWP100) | Mt CO2 / yr | 0 | TOTAL | fugitive | 8.00 | 9.00 | 10.00 |
2 | TESTcsv2021 | HISTORY | FRA | CH4 | Gg CH4 / yr | 2 | TOTAL | fugitive | 7.00 | 8.00 | 9.00 |
3 | TESTcsv2021 | HISTORY | FRA | CO2 | Gg CO2 / yr | 2 | TOTAL | fugitive | 12.00 | 13.00 | 14.00 |
4 | TESTcsv2021 | HISTORY | FRA | KYOTOGHG (SARGWP100) | Mt CO2 / yr | 0 | TOTAL | fugitive | 0.03 | 0.02 | 0.04 |
data_if.attrs
{'attrs': {'references': 'Just ask around.',
'rights': 'public domain',
'area': 'area (ISO3)',
'cat': 'category (IPCC2006)',
'scen': 'scenario (general)',
'sec_cats': ['Class (class)', 'Type (type)']},
'time_format': '%Y',
'dimensions': {'*': ['source',
'scenario (general)',
'area (ISO3)',
'entity',
'unit',
'category (IPCC2006)',
'Class (class)',
'Type (type)']}}
Transformation to PRIMAP2 xarray format#
The transformation to PRIMAP2 xarray format is done using the function primap2.pm2io.from_interchange_format()
which takes an interchange format DataFrame.
The resulting xr Dataset is already quantified, thus the variables are pint arrays which include a unit.
data_pm2 = pm2.pm2io.from_interchange_format(data_if)
data_pm2
2024-10-07 16:18:55.689 | DEBUG | primap2.pm2io._interchange_format:from_interchange_format:319 - Expected array shapes: [[1, 1, 4, 3, 4, 1, 1], [1, 1, 4, 3, 4, 1, 1], [1, 1, 4, 3, 4, 1, 1]], resulting in size 144.
2024-10-07 16:18:55.721 | INFO | primap2._data_format:ensure_valid_attributes:373 - Reference information is not a DOI: 'Just ask around.'
<xarray.Dataset> Size: 1kB Dimensions: (time: 3, category (IPCC2006): 4, source: 1, Class (class): 1, area (ISO3): 4, Type (type): 1, scenario (general): 1) Coordinates: * category (IPCC2006) (category (IPCC2006)) object 32B '0' '1' '2' '3' * source (source) object 8B 'TESTcsv2021' * Class (class) (Class (class)) object 8B 'TOTAL' * area (ISO3) (area (ISO3)) object 32B 'AUS' 'FRA' 'USA' 'ZAM' * Type (type) (Type (type)) object 8B 'fugitive' * scenario (general) (scenario (general)) object 8B 'HISTORY' * time (time) datetime64[ns] 24B 1991-01-01 ... 2010-01-01 Data variables: CH4 (time, category (IPCC2006), source, Class (class), area (ISO3), Type (type), scenario (general)) float64 384B [CH4·Gg/yr] ... CO2 (time, category (IPCC2006), source, Class (class), area (ISO3), Type (type), scenario (general)) float64 384B [CO2·Gg/yr] ... KYOTOGHG (SARGWP100) (time, category (IPCC2006), source, Class (class), area (ISO3), Type (type), scenario (general)) float64 384B [CO2·Mt/yr] ... Attributes: references: Just ask around. rights: public domain area: area (ISO3) cat: category (IPCC2006) scen: scenario (general) sec_cats: ['Class (class)', 'Type (type)']