Data reading example 2 - PRIMAP-hist v2.2¶
To run this example the file PRIMAPHIST22__19-Jan-2021.csv
must be placed in the same folder as this notebook. The PRIMAP-hist data (doi:10.5281/zenodo.4479172) is available from Zenodo: https://zenodo.org/record/4479172
[1]:
# imports
import primap2 as pm2
Dataset Specifications¶
Here we define which columns of the csv file contain the coordinates. The dict coords_cols
contains the mapping of csv columns to PRIMAP2 dimensions. Default values are set using coords_defaults
. The terminologies (e.g. IPCC2006 for categories or the ISO3 country codes for area) are set in the coords_terminologies
dict. coords_value_mapping
defines conversion of metadata values, e.g. category codes. filter_keep
and filter_remove
filter the input data. Each entry in
filter_keep
specifies a subset of the input data which is kept while the subsets defined by filter_remove
are removed from the input data.
For details, we refer to the documentation of read_wide_csv_file_if
located in the pm2io
module of PRIMAP2.
[2]:
file = "PRIMAPHIST22__19-Jan-2021.csv"
coords_cols = {
"unit": "unit",
"entity": "entity",
"area": "country",
"scenario": "scenario",
"category": "category",
}
coords_defaults = {
"source": "PRIMAP-hist_v2.2",
}
coords_terminologies = {
"area": "ISO3",
"category": "IPCC2006",
"scenario": "PRIMAP-hist",
}
coords_value_mapping = {
"category": "PRIMAP1",
"unit": "PRIMAP1",
"entity": "PRIMAP1",
}
filter_keep = {
"f1": {
"entity": "CO2",
"category": ["IPC2", "IPC1"],
"country": ["AUS", "BRA", "CHN", "GBR", "AFG"],
},
"f2": {
"entity": "KYOTOGHG",
"category": ["IPCMAG", "IPC4"],
"country": ["AUS", "BRA", "CHN", "GBR", "AFG"],
},
}
filter_remove = {"f1": {"scenario": "HISTTP"}}
# filter_keep = {"f1": {"entity": "KYOTOGHG", "category": ["IPC2", "IPC1"]},}
# filter_keep = {}
# filter_remove = {}
meta_data = {"references": "doi:10.5281/zenodo.4479172"}
Reading the data to interchange format¶
To enable a wider use of the PRIMAP2 data reading functionality we read into the PRIMAP2 interchange format, which is a wide format pandas DataFrame with coordinates in columns and following PRIMAP2 specifications. Additional metadata not captured in this format are stored in DataFrame.attrs
as a dictionary. As the attrs
functionality in pandas is experimental it is just stored in the DataFrame returned by the reading functions and should be stored individually before doing any
processing with the DataFrame.
Here we read the data using the read_wide_csv_file_if()
function. We have specified restrictive filters above to limit the data included in this notebook.
[3]:
PMH_if = pm2.pm2io.read_wide_csv_file_if(
file,
coords_cols=coords_cols,
coords_defaults=coords_defaults,
coords_terminologies=coords_terminologies,
coords_value_mapping=coords_value_mapping,
filter_keep=filter_keep,
filter_remove=filter_remove,
meta_data=meta_data,
)
PMH_if.head()
[3]:
source | scenario (PRIMAP-hist) | area (ISO3) | entity | unit | category (IPCC2006) | 1850 | 1851 | 1852 | 1853 | ... | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | PRIMAP-hist_v2.2 | HISTCR | AFG | CO2 | Gg CO2 / yr | 1 | 0.147 | 0.155 | 0.163 | 0.172 | ... | 6750.0 | 8440.0 | 12200.0 | 10700.0 | 9990.0 | 11000.0 | 11700.0 | 12700.0 | 13100.0 | 18600.0 |
1 | PRIMAP-hist_v2.2 | HISTCR | AFG | CO2 | Gg CO2 / yr | 2 | 0.169 | 0.178 | 0.188 | 0.198 | ... | 191.0 | 207.0 | 207.0 | 268.0 | 341.0 | 318.0 | 269.0 | 293.0 | 292.0 | 298.0 |
2 | PRIMAP-hist_v2.2 | HISTCR | AFG | KYOTOGHG (SARGWP100) | Gg CO2 / yr | 4 | 155.000 | 154.000 | 154.000 | 153.000 | ... | 3080.0 | 3160.0 | 3270.0 | 3400.0 | 3510.0 | 3620.0 | 3730.0 | 3800.0 | 3900.0 | 4010.0 |
3 | PRIMAP-hist_v2.2 | HISTCR | AFG | KYOTOGHG (SARGWP100) | Gg CO2 / yr | M.AG | 615.000 | 668.000 | 719.000 | 770.000 | ... | 12400.0 | 14100.0 | 14300.0 | 14200.0 | 14100.0 | 14600.0 | 13600.0 | 13900.0 | 13800.0 | 13300.0 |
4 | PRIMAP-hist_v2.2 | HISTCR | AUS | CO2 | Gg CO2 / yr | 1 | 0.000 | 0.000 | 0.000 | 0.000 | ... | 384000.0 | 380000.0 | 378000.0 | 383000.0 | 376000.0 | 372000.0 | 380000.0 | 389000.0 | 393000.0 | 393000.0 |
5 rows × 175 columns
[4]:
PMH_if.attrs
[4]:
{'attrs': {'references': 'doi:10.5281/zenodo.4479172',
'area': 'area (ISO3)',
'scen': 'scenario (PRIMAP-hist)',
'cat': 'category (IPCC2006)'},
'time_format': '%Y',
'dimensions': {'CO2': ['unit',
'entity',
'area (ISO3)',
'category (IPCC2006)',
'source',
'scenario (PRIMAP-hist)'],
'KYOTOGHG (SARGWP100)': ['unit',
'entity',
'area (ISO3)',
'category (IPCC2006)',
'source',
'scenario (PRIMAP-hist)']}}
Transformation to PRIMAP2 xarray format¶
The transformation to PRIMAP2 xarray format is done using the function from_interchange_format
which takes an interchange format DataFrame. The resulting xr Dataset is already quantified, thus the variables are pint arrays which include a unit.
[5]:
PMH_pm2 = pm2.pm2io.from_interchange_format(PMH_if)
PMH_pm2
2021-03-31 11:46:22.696 | DEBUG | primap2.pm2io._interchange_format:from_interchange_format:252 - Expected array shapes: [[2, 5, 4, 1, 1], [2, 5, 4, 1, 1]], resulting in size 80.
[5]:
<xarray.Dataset> Dimensions: (area (ISO3): 5, category (IPCC2006): 4, scenario (PRIMAP-hist): 1, source: 1, time: 169) Coordinates: * category (IPCC2006) (category (IPCC2006)) object '1' '2' '4' 'M.AG' * time (time) datetime64[ns] 1850-01-01 ... 2018-01-01 * area (ISO3) (area (ISO3)) object 'AFG' 'AUS' 'BRA' 'CHN' 'GBR' * source (source) object 'PRIMAP-hist_v2.2' * scenario (PRIMAP-hist) (scenario (PRIMAP-hist)) object 'HISTCR' Data variables: CO2 (time, area (ISO3), category (IPCC2006), source, scenario (PRIMAP-hist)) float64 [CO2·Gg/annum] ... KYOTOGHG (SARGWP100) (time, area (ISO3), category (IPCC2006), source, scenario (PRIMAP-hist)) float64 [CO2·Gg/annum] ... Attributes: references: doi:10.5281/zenodo.4479172 area: area (ISO3) scen: scenario (PRIMAP-hist) cat: category (IPCC2006)
- area (ISO3): 5
- category (IPCC2006): 4
- scenario (PRIMAP-hist): 1
- source: 1
- time: 169
- category (IPCC2006)(category (IPCC2006))object'1' '2' '4' 'M.AG'
array(['1', '2', '4', 'M.AG'], dtype=object)
- time(time)datetime64[ns]1850-01-01 ... 2018-01-01
array(['1850-01-01T00:00:00.000000000', '1851-01-01T00:00:00.000000000', '1852-01-01T00:00:00.000000000', '1853-01-01T00:00:00.000000000', '1854-01-01T00:00:00.000000000', '1855-01-01T00:00:00.000000000', '1856-01-01T00:00:00.000000000', '1857-01-01T00:00:00.000000000', '1858-01-01T00:00:00.000000000', '1859-01-01T00:00:00.000000000', '1860-01-01T00:00:00.000000000', '1861-01-01T00:00:00.000000000', '1862-01-01T00:00:00.000000000', '1863-01-01T00:00:00.000000000', '1864-01-01T00:00:00.000000000', '1865-01-01T00:00:00.000000000', '1866-01-01T00:00:00.000000000', '1867-01-01T00:00:00.000000000', '1868-01-01T00:00:00.000000000', '1869-01-01T00:00:00.000000000', '1870-01-01T00:00:00.000000000', '1871-01-01T00:00:00.000000000', '1872-01-01T00:00:00.000000000', '1873-01-01T00:00:00.000000000', '1874-01-01T00:00:00.000000000', '1875-01-01T00:00:00.000000000', '1876-01-01T00:00:00.000000000', '1877-01-01T00:00:00.000000000', '1878-01-01T00:00:00.000000000', '1879-01-01T00:00:00.000000000', '1880-01-01T00:00:00.000000000', '1881-01-01T00:00:00.000000000', '1882-01-01T00:00:00.000000000', '1883-01-01T00:00:00.000000000', '1884-01-01T00:00:00.000000000', '1885-01-01T00:00:00.000000000', '1886-01-01T00:00:00.000000000', '1887-01-01T00:00:00.000000000', '1888-01-01T00:00:00.000000000', '1889-01-01T00:00:00.000000000', '1890-01-01T00:00:00.000000000', '1891-01-01T00:00:00.000000000', '1892-01-01T00:00:00.000000000', '1893-01-01T00:00:00.000000000', '1894-01-01T00:00:00.000000000', '1895-01-01T00:00:00.000000000', '1896-01-01T00:00:00.000000000', '1897-01-01T00:00:00.000000000', '1898-01-01T00:00:00.000000000', '1899-01-01T00:00:00.000000000', '1900-01-01T00:00:00.000000000', '1901-01-01T00:00:00.000000000', '1902-01-01T00:00:00.000000000', '1903-01-01T00:00:00.000000000', '1904-01-01T00:00:00.000000000', '1905-01-01T00:00:00.000000000', '1906-01-01T00:00:00.000000000', '1907-01-01T00:00:00.000000000', '1908-01-01T00:00:00.000000000', '1909-01-01T00:00:00.000000000', '1910-01-01T00:00:00.000000000', '1911-01-01T00:00:00.000000000', '1912-01-01T00:00:00.000000000', '1913-01-01T00:00:00.000000000', '1914-01-01T00:00:00.000000000', '1915-01-01T00:00:00.000000000', '1916-01-01T00:00:00.000000000', '1917-01-01T00:00:00.000000000', '1918-01-01T00:00:00.000000000', '1919-01-01T00:00:00.000000000', '1920-01-01T00:00:00.000000000', '1921-01-01T00:00:00.000000000', '1922-01-01T00:00:00.000000000', '1923-01-01T00:00:00.000000000', '1924-01-01T00:00:00.000000000', '1925-01-01T00:00:00.000000000', '1926-01-01T00:00:00.000000000', '1927-01-01T00:00:00.000000000', '1928-01-01T00:00:00.000000000', '1929-01-01T00:00:00.000000000', '1930-01-01T00:00:00.000000000', '1931-01-01T00:00:00.000000000', '1932-01-01T00:00:00.000000000', '1933-01-01T00:00:00.000000000', '1934-01-01T00:00:00.000000000', '1935-01-01T00:00:00.000000000', '1936-01-01T00:00:00.000000000', '1937-01-01T00:00:00.000000000', '1938-01-01T00:00:00.000000000', '1939-01-01T00:00:00.000000000', '1940-01-01T00:00:00.000000000', '1941-01-01T00:00:00.000000000', '1942-01-01T00:00:00.000000000', '1943-01-01T00:00:00.000000000', '1944-01-01T00:00:00.000000000', '1945-01-01T00:00:00.000000000', '1946-01-01T00:00:00.000000000', '1947-01-01T00:00:00.000000000', '1948-01-01T00:00:00.000000000', '1949-01-01T00:00:00.000000000', '1950-01-01T00:00:00.000000000', '1951-01-01T00:00:00.000000000', '1952-01-01T00:00:00.000000000', '1953-01-01T00:00:00.000000000', '1954-01-01T00:00:00.000000000', '1955-01-01T00:00:00.000000000', '1956-01-01T00:00:00.000000000', '1957-01-01T00:00:00.000000000', '1958-01-01T00:00:00.000000000', '1959-01-01T00:00:00.000000000', '1960-01-01T00:00:00.000000000', '1961-01-01T00:00:00.000000000', '1962-01-01T00:00:00.000000000', '1963-01-01T00:00:00.000000000', '1964-01-01T00:00:00.000000000', '1965-01-01T00:00:00.000000000', '1966-01-01T00:00:00.000000000', '1967-01-01T00:00:00.000000000', '1968-01-01T00:00:00.000000000', '1969-01-01T00:00:00.000000000', '1970-01-01T00:00:00.000000000', '1971-01-01T00:00:00.000000000', '1972-01-01T00:00:00.000000000', '1973-01-01T00:00:00.000000000', '1974-01-01T00:00:00.000000000', '1975-01-01T00:00:00.000000000', '1976-01-01T00:00:00.000000000', '1977-01-01T00:00:00.000000000', '1978-01-01T00:00:00.000000000', '1979-01-01T00:00:00.000000000', '1980-01-01T00:00:00.000000000', '1981-01-01T00:00:00.000000000', '1982-01-01T00:00:00.000000000', '1983-01-01T00:00:00.000000000', '1984-01-01T00:00:00.000000000', '1985-01-01T00:00:00.000000000', '1986-01-01T00:00:00.000000000', '1987-01-01T00:00:00.000000000', '1988-01-01T00:00:00.000000000', '1989-01-01T00:00:00.000000000', '1990-01-01T00:00:00.000000000', '1991-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000', '1993-01-01T00:00:00.000000000', '1994-01-01T00:00:00.000000000', '1995-01-01T00:00:00.000000000', '1996-01-01T00:00:00.000000000', '1997-01-01T00:00:00.000000000', '1998-01-01T00:00:00.000000000', '1999-01-01T00:00:00.000000000', '2000-01-01T00:00:00.000000000', '2001-01-01T00:00:00.000000000', '2002-01-01T00:00:00.000000000', '2003-01-01T00:00:00.000000000', '2004-01-01T00:00:00.000000000', '2005-01-01T00:00:00.000000000', '2006-01-01T00:00:00.000000000', '2007-01-01T00:00:00.000000000', '2008-01-01T00:00:00.000000000', '2009-01-01T00:00:00.000000000', '2010-01-01T00:00:00.000000000', '2011-01-01T00:00:00.000000000', '2012-01-01T00:00:00.000000000', '2013-01-01T00:00:00.000000000', '2014-01-01T00:00:00.000000000', '2015-01-01T00:00:00.000000000', '2016-01-01T00:00:00.000000000', '2017-01-01T00:00:00.000000000', '2018-01-01T00:00:00.000000000'], dtype='datetime64[ns]')
- area (ISO3)(area (ISO3))object'AFG' 'AUS' 'BRA' 'CHN' 'GBR'
array(['AFG', 'AUS', 'BRA', 'CHN', 'GBR'], dtype=object)
- source(source)object'PRIMAP-hist_v2.2'
array(['PRIMAP-hist_v2.2'], dtype=object)
- scenario (PRIMAP-hist)(scenario (PRIMAP-hist))object'HISTCR'
array(['HISTCR'], dtype=object)
- CO2(time, area (ISO3), category (IPCC2006), source, scenario (PRIMAP-hist))float64[CO2·Gg/annum] 0.147 0.169 ... nan
- entity :
- CO2
Magnitude [[[[[0.147]]
[[0.169]]
[[nan]]
[[nan]]]
[[[0.0]]
[[0.0]]
[[nan]]
[[nan]]]
[[[92.1]]
[[78.4]]
[[nan]]
[[nan]]]
[[[11.0]]
[[0.0]]
[[nan]]
[[nan]]]
[[[123000.0]]
[[605.0]]
[[nan]]
[[nan]]]]
[[[[0.155]]
[[0.178]]
[[nan]]
[[nan]]]
[[[0.0]]
[[0.0]]
[[nan]]
[[nan]]]
[[[96.9]]
[[82.6]]
[[nan]]
[[nan]]]
[[[11.6]]
[[0.0]]
[[nan]]
[[nan]]]
[[[117000.0]]
[[575.0]]
[[nan]]
[[nan]]]]
[[[[0.163]]
[[0.188]]
[[nan]]
[[nan]]]
[[[0.0]]
[[0.0]]
[[nan]]
[[nan]]]
[[[102.0]]
[[87.0]]
[[nan]]
[[nan]]]
[[[12.2]]
[[0.0]]
[[nan]]
[[nan]]]
[[[117000.0]]
[[573.0]]
[[nan]]
[[nan]]]]
...
[[[[12700.0]]
[[293.0]]
[[nan]]
[[nan]]]
[[[389000.0]]
[[19100.0]]
[[nan]]
[[nan]]]
[[[401000.0]]
[[78500.0]]
[[nan]]
[[nan]]]
[[[9030000.0]]
[[1330000.0]]
[[nan]]
[[nan]]]
[[[384000.0]]
[[14000.0]]
[[nan]]
[[nan]]]]
[[[[13100.0]]
[[292.0]]
[[nan]]
[[nan]]]
[[[393000.0]]
[[19400.0]]
[[nan]]
[[nan]]]
[[[403000.0]]
[[79500.0]]
[[nan]]
[[nan]]]
[[[9190000.0]]
[[1340000.0]]
[[nan]]
[[nan]]]
[[[373000.0]]
[[14400.0]]
[[nan]]
[[nan]]]]
[[[[18600.0]]
[[298.0]]
[[nan]]
[[nan]]]
[[[393000.0]]
[[19800.0]]
[[nan]]
[[nan]]]
[[[389000.0]]
[[79300.0]]
[[nan]]
[[nan]]]
[[[9400000.0]]
[[1380000.0]]
[[nan]]
[[nan]]]
[[[366000.0]]
[[13600.0]]
[[nan]]
[[nan]]]]]Units CO2 gigagram/a - KYOTOGHG (SARGWP100)(time, area (ISO3), category (IPCC2006), source, scenario (PRIMAP-hist))float64[CO2·Gg/annum] nan nan ... 3.76e+04
- gwp_context :
- SARGWP100
- entity :
- KYOTOGHG
Magnitude [[[[[nan]]
[[nan]]
[[155.0]]
[[615.0]]]
[[[nan]]
[[nan]]
[[911.0]]
[[17300.0]]]
[[[nan]]
[[nan]]
[[165.0]]
[[1090.0]]]
[[[nan]]
[[nan]]
[[15000.0]]
[[18700.0]]]
[[[nan]]
[[nan]]
[[11400.0]]
[[8170.0]]]]
[[[[nan]]
[[nan]]
[[154.0]]
[[668.0]]]
[[[nan]]
[[nan]]
[[922.0]]
[[17300.0]]]
[[[nan]]
[[nan]]
[[168.0]]
[[1140.0]]]
[[[nan]]
[[nan]]
[[15000.0]]
[[20900.0]]]
[[[nan]]
[[nan]]
[[11500.0]]
[[8190.0]]]]
[[[[nan]]
[[nan]]
[[154.0]]
[[719.0]]]
[[[nan]]
[[nan]]
[[934.0]]
[[17400.0]]]
[[[nan]]
[[nan]]
[[170.0]]
[[1190.0]]]
[[[nan]]
[[nan]]
[[14900.0]]
[[23000.0]]]
[[[nan]]
[[nan]]
[[11700.0]]
[[8230.0]]]]
...
[[[[nan]]
[[nan]]
[[3800.0]]
[[13900.0]]]
[[[nan]]
[[nan]]
[[10700.0]]
[[64700.0]]]
[[[nan]]
[[nan]]
[[63900.0]]
[[450000.0]]]
[[[nan]]
[[nan]]
[[182000.0]]
[[848000.0]]]
[[[nan]]
[[nan]]
[[17500.0]]
[[37700.0]]]]
[[[[nan]]
[[nan]]
[[3900.0]]
[[13800.0]]]
[[[nan]]
[[nan]]
[[10800.0]]
[[68300.0]]]
[[[nan]]
[[nan]]
[[65500.0]]
[[456000.0]]]
[[[nan]]
[[nan]]
[[187000.0]]
[[830000.0]]]
[[[nan]]
[[nan]]
[[17800.0]]
[[38000.0]]]]
[[[[nan]]
[[nan]]
[[4010.0]]
[[13300.0]]]
[[[nan]]
[[nan]]
[[10800.0]]
[[66800.0]]]
[[[nan]]
[[nan]]
[[67100.0]]
[[444000.0]]]
[[[nan]]
[[nan]]
[[192000.0]]
[[820000.0]]]
[[[nan]]
[[nan]]
[[18000.0]]
[[37600.0]]]]]Units CO2 gigagram/a
- references :
- doi:10.5281/zenodo.4479172
- area :
- area (ISO3)
- scen :
- scenario (PRIMAP-hist)
- cat :
- category (IPCC2006)
[5]: