Data reading example 2 - PRIMAP-hist v2.2#
In this example, we read an old version of PRIMAP-hist which is not available in the native format because it was produced before the native format was invented.
# imports
import primap2 as pm2
Obtain the input data#
The PRIMAP-hist data (doi:10.5281/zenodo.4479172) is available from Zenodo, we download it directly.
import requests
response = requests.get("https://zenodo.org/records/4479172/files/PRIMAP-hist_v2.2_19-Jan-2021.csv?download=1")
file = "PRIMAPHIST22__19-Jan-2021.csv"
with open(file, "w") as fd:
fd.write(response.text)
Dataset Specifications#
Here we define which columns of the csv file contain the coordinates.
The dict coords_cols
contains the mapping of csv columns to PRIMAP2 dimensions.
Default values are set using coords_defaults
.
The terminologies (e.g. IPCC2006 for categories or the ISO3 country codes for area) are set in the coords_terminologies
dict.
coords_value_mapping
defines conversion of metadata values, e.g. category codes.
filter_keep
and filter_remove
filter the input data.
Each entry in filter_keep
specifies a subset of the input data which is kept while the subsets defined by filter_remove
are removed from the input data.
For details, we refer to the documentation of primap2.pm2io.read_wide_csv_file_if()
.
coords_cols = {
"unit": "unit",
"entity": "entity",
"area": "country",
"scenario": "scenario",
"category": "category",
}
coords_defaults = {
"source": "PRIMAP-hist_v2.2",
}
coords_terminologies = {
"area": "ISO3",
"category": "IPCC2006",
"scenario": "PRIMAP-hist",
}
coords_value_mapping = {
"category": "PRIMAP1",
"unit": "PRIMAP1",
"entity": "PRIMAP1",
}
filter_keep = {
"f1": {
"entity": "CO2",
"category": ["IPC2", "IPC1"],
"country": ["AUS", "BRA", "CHN", "GBR", "AFG"],
},
"f2": {
"entity": "KYOTOGHG",
"category": ["IPCMAG", "IPC4"],
"country": ["AUS", "BRA", "CHN", "GBR", "AFG"],
},
}
filter_remove = {"f1": {"scenario": "HISTTP"}}
# filter_keep = {"f1": {"entity": "KYOTOGHG", "category": ["IPC2", "IPC1"]},}
# filter_keep = {}
# filter_remove = {}
meta_data = {"references": "doi:10.5281/zenodo.4479172"}
Reading the data to interchange format#
To enable a wider use of the PRIMAP2 data reading functionality we read into the PRIMAP2 interchange format, which is a wide format pandas DataFrame with coordinates in columns and following PRIMAP2 specifications.
Additional metadata not captured in this format are stored in DataFrame.attrs
as a dictionary.
As the attrs
functionality in pandas is experimental it is just stored in the DataFrame returned by the reading functions and should be stored individually before doing any processing with the DataFrame.
Here we read the data using the primap2.pm2io.read_wide_csv_file_if()
function.
We have specified restrictive filters above to limit the data included in this notebook.
PMH_if = pm2.pm2io.read_wide_csv_file_if(
file,
coords_cols=coords_cols,
coords_defaults=coords_defaults,
coords_terminologies=coords_terminologies,
coords_value_mapping=coords_value_mapping,
filter_keep=filter_keep,
filter_remove=filter_remove,
meta_data=meta_data,
)
PMH_if.head()
/home/docs/checkouts/readthedocs.org/user_builds/primap2/checkouts/stable/primap2/pm2io/_data_reading.py:1156: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
data[coord] = coords_defaults[coord]
source | scenario (PRIMAP-hist) | area (ISO3) | entity | unit | category (IPCC2006) | 1850 | 1851 | 1852 | 1853 | ... | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | PRIMAP-hist_v2.2 | HISTCR | AFG | CO2 | Gg CO2 / yr | 1 | 0.147 | 0.155 | 0.163 | 0.172 | ... | 6750.0 | 8440.0 | 12200.0 | 10700.0 | 9990.0 | 11000.0 | 11700.0 | 12700.0 | 13100.0 | 18600.0 |
1 | PRIMAP-hist_v2.2 | HISTCR | AFG | CO2 | Gg CO2 / yr | 2 | 0.169 | 0.178 | 0.188 | 0.198 | ... | 191.0 | 207.0 | 207.0 | 268.0 | 341.0 | 318.0 | 269.0 | 293.0 | 292.0 | 298.0 |
2 | PRIMAP-hist_v2.2 | HISTCR | AFG | KYOTOGHG (SARGWP100) | Gg CO2 / yr | 4 | 155.000 | 154.000 | 154.000 | 153.000 | ... | 3080.0 | 3160.0 | 3270.0 | 3400.0 | 3510.0 | 3620.0 | 3730.0 | 3800.0 | 3900.0 | 4010.0 |
3 | PRIMAP-hist_v2.2 | HISTCR | AFG | KYOTOGHG (SARGWP100) | Gg CO2 / yr | M.AG | 615.000 | 668.000 | 719.000 | 770.000 | ... | 12400.0 | 14100.0 | 14300.0 | 14200.0 | 14100.0 | 14600.0 | 13600.0 | 13900.0 | 13800.0 | 13300.0 |
4 | PRIMAP-hist_v2.2 | HISTCR | AUS | CO2 | Gg CO2 / yr | 1 | 0.000 | 0.000 | 0.000 | 0.000 | ... | 384000.0 | 380000.0 | 378000.0 | 383000.0 | 376000.0 | 372000.0 | 380000.0 | 389000.0 | 393000.0 | 393000.0 |
5 rows × 175 columns
PMH_if.attrs
{'attrs': {'references': 'doi:10.5281/zenodo.4479172',
'area': 'area (ISO3)',
'scen': 'scenario (PRIMAP-hist)',
'cat': 'category (IPCC2006)'},
'time_format': '%Y',
'dimensions': {'*': ['source',
'scenario (PRIMAP-hist)',
'area (ISO3)',
'entity',
'unit',
'category (IPCC2006)']}}
Transformation to PRIMAP2 xarray format#
The transformation to PRIMAP2 xarray format is done using the function primap2.pm2io.from_interchange_format()
which takes an interchange format DataFrame.
The resulting xr Dataset is already quantified, thus the variables are pint arrays which include a unit.
PMH_pm2 = pm2.pm2io.from_interchange_format(PMH_if)
PMH_pm2
2024-10-07 16:18:45.691 | DEBUG | primap2.pm2io._interchange_format:from_interchange_format:319 - Expected array shapes: [[1, 1, 5, 2, 4], [1, 1, 5, 2, 4]], resulting in size 80.
<xarray.Dataset> Size: 56kB Dimensions: (time: 169, category (IPCC2006): 4, source: 1, area (ISO3): 5, scenario (PRIMAP-hist): 1) Coordinates: * category (IPCC2006) (category (IPCC2006)) object 32B '1' '2' '4' 'M.AG' * source (source) object 8B 'PRIMAP-hist_v2.2' * area (ISO3) (area (ISO3)) object 40B 'AFG' 'AUS' ... 'CHN' 'GBR' * scenario (PRIMAP-hist) (scenario (PRIMAP-hist)) object 8B 'HISTCR' * time (time) datetime64[ns] 1kB 1850-01-01 ... 2018-01-01 Data variables: CO2 (time, category (IPCC2006), source, area (ISO3), scenario (PRIMAP-hist)) float64 27kB [CO2·Gg/yr] ... KYOTOGHG (SARGWP100) (time, category (IPCC2006), source, area (ISO3), scenario (PRIMAP-hist)) float64 27kB [CO2·Gg/yr] ... Attributes: references: doi:10.5281/zenodo.4479172 area: area (ISO3) scen: scenario (PRIMAP-hist) cat: category (IPCC2006)