Data reading example 3 - minimal test dataset (long)#
To run this example the file test_csv_data_long.csv
must be placed in the same folder as this notebook.
You can find the notebook and the csv file in the folder docs/source/data_reading
in the PRIMAP2 repository.
# imports
import primap2 as pm2
Dataset Specifications#
Here we define which columns of the csv file contain the metadata.
The dict coords_cols
contains the mapping of csv columns to PRIMAP2 dimensions.
Default values not found in the CSV are set using coords_defaults
.
The terminologies (e.g. IPCC2006 for categories or the ISO3 country codes for area) are set in the coords_terminologies
dict.
coords_value_mapping
defines conversion of metadata values, e.g. category codes.
You can either specify a dict for a metadata column which directly defines the mapping, a function which is used to map metadata values, or a string to select one of the pre-defined functions included in PRIMAP2.
filter_keep
and filter_remove
filter the input data.
Each entry in filter_keep
specifies a subset of the input data which is kept while the subsets defined by filter_remove
are removed from the input data.
In the example, the CSV contains the coordinates country
, category
, gas
, and year
.
They are translated into their proper PRIMAP2 names by specifying the in the
coords_cols
dictionary. Additionally, columns are specified for the unit
, and
for the actual data
(which is found in the column emissions
in the CSV file).
The format used in the year
column is given using the time_format
argument.
Values for the scenario
and source
coordinate is not available in the csv file;
therefore, we add them using default values defined in coords_defaults
.
Terminologies are given for area
, category
, scenario
, and the secondary categories.
Providing these terminologies is mandatory to create a valid PRIMAP2 dataset.
Coordinate mapping is necessary for category
, entity
, and unit
.
They all use the PRIMAP1 specifications in the csv file.
For category
this means that e.g. IPC1A2
would be converted to 1.A.2
for entity
the conversion affects the way GWP information is stored in the entity name: e.g. KYOTOGHGAR4
is mapped to KYOTOGHG (AR4GWP100)
.
In this example, we also add meta_data
to add a reference for the data and usage rights.
file = "test_csv_data_long.csv"
coords_cols = {
"unit": "unit",
"entity": "gas",
"area": "country",
"category": "category",
"time": "year",
"data": "emissions",
}
coords_defaults = {
"source": "TESTcsv2021",
"scenario": "HISTORY",
}
coords_terminologies = {
"area": "ISO3",
"category": "IPCC2006",
"scenario": "general",
}
coords_value_mapping = {
"category": "PRIMAP1",
"entity": "PRIMAP1",
"unit": "PRIMAP1",
}
meta_data = {
"references": "Just ask around.",
"rights": "public domain",
}
data_if = pm2.pm2io.read_long_csv_file_if(
file,
coords_cols=coords_cols,
coords_defaults=coords_defaults,
coords_terminologies=coords_terminologies,
coords_value_mapping=coords_value_mapping,
meta_data=meta_data,
time_format="%Y",
)
data_if.head()
source | scenario (general) | area (ISO3) | entity | unit | category (IPCC2006) | 1991 | 2000 | 2010 | |
---|---|---|---|---|---|---|---|---|---|
0 | TESTcsv2021 | HISTORY | AUS | CO2 | Gg CO2 / yr | 1 | 4.1 | 5.0 | 6.0 |
1 | TESTcsv2021 | HISTORY | ZAM | CH4 | Mt CH4 / yr | 2 | 7.0 | 8.0 | 9.0 |
data_if.attrs
{'attrs': {'references': 'Just ask around.',
'rights': 'public domain',
'area': 'area (ISO3)',
'cat': 'category (IPCC2006)',
'scen': 'scenario (general)'},
'time_format': '%Y',
'dimensions': {'*': ['source',
'scenario (general)',
'area (ISO3)',
'entity',
'unit',
'category (IPCC2006)']}}
Transformation to PRIMAP2 xarray format#
The transformation to PRIMAP2 xarray format is done using the function primap2.pm2io.from_interchange_format()
which takes an interchange format DataFrame.
The resulting xr Dataset is already quantified, thus the variables are pint arrays which include a unit.
data_pm2 = pm2.pm2io.from_interchange_format(data_if)
data_pm2
2024-10-07 16:18:49.338 | DEBUG | primap2.pm2io._interchange_format:from_interchange_format:319 - Expected array shapes: [[1, 1, 2, 2, 2], [1, 1, 2, 2, 2]], resulting in size 16.
2024-10-07 16:18:49.362 | INFO | primap2._data_format:ensure_valid_attributes:373 - Reference information is not a DOI: 'Just ask around.'
<xarray.Dataset> Size: 264B Dimensions: (time: 3, area (ISO3): 2, category (IPCC2006): 2, scenario (general): 1, source: 1) Coordinates: * area (ISO3) (area (ISO3)) object 16B 'AUS' 'ZAM' * category (IPCC2006) (category (IPCC2006)) object 16B '1' '2' * scenario (general) (scenario (general)) object 8B 'HISTORY' * source (source) object 8B 'TESTcsv2021' * time (time) datetime64[ns] 24B 1991-01-01 ... 2010-01-01 Data variables: CH4 (time, area (ISO3), category (IPCC2006), scenario (general), source) float64 96B [CH4·Mt/yr] ... CO2 (time, area (ISO3), category (IPCC2006), scenario (general), source) float64 96B [CO2·Gg/yr] ... Attributes: references: Just ask around. rights: public domain area: area (ISO3) cat: category (IPCC2006) scen: scenario (general)