primap2.pm2io module¶
The io module contains functions to read data from different formats and convert it into the interchange format and the native primap2 format.
Data reading module of the PRIMAP2 climate policy analysis package.
- primap2.pm2io.convert_long_dataframe_if(data_long: DataFrame, *, coords_cols: dict[str, str], add_coords_cols: None | dict[str, list[str]] = None, coords_defaults: None | dict[str, Any] = None, coords_terminologies: dict[str, str], coords_value_mapping: None | dict[str, Any] = None, coords_value_filling: None | dict[str, dict[str, dict]] = None, filter_keep: None | dict[str, dict[str, Any]] = None, filter_remove: None | dict[str, dict[str, Any]] = None, meta_data: None | dict[str, Any] = None, time_format: str = '%Y-%m-%d', convert_str: bool | dict[str, float] = True, copy_df: bool = True) DataFrame [source]¶
convert a DataFrame in long (tidy) format into the PRIMAP2 interchange format.
Columns can be renamed or filled with default values to match the PRIMAP2 structure. Where we refer to “dimensions” in the parameter description below we mean the basic dimension names without the added terminology (e.g. “area” not “area (ISO3)”). The terminology information will be added by this function. You can not use the short dimension names in the attributes (e.g. “cat” instead of “category”).
- Parameters:
- data_long: str, pd.DataFrame
Long format DataFrame file which will be converted.
- coords_colsdict
Dict where the keys are column names in the files to be read and the value is the dimension in PRIMAP2. To specify the data column containing the observable, use the “data” key. For secondary categories use a
sec_cats__
prefix.- add_coords_colsdict, optional
Dict where the keys are PRIMAP2 additional coordinate names and the values are lists with two elements where the first is the column in the dataframe to be converted and the second is the primap2 dimension for the coordinate (e.g.
category
for acategory_name
coordinate).- coords_defaultsdict, optional
Dict for default values of coordinates / dimensions not given in the csv files. The keys are the dimension names and the values are the values for the dimensions. For secondary categories use a
sec_cats__
prefix.- coords_terminologiesdict
Dict defining the terminologies used for the different coordinates (e.g. ISO3 for area). Only possible coordinates here are: area, category, scenario, entity, and secondary categories. For secondary categories use a
sec_cats__
prefix. All entries different from “area”, “category”, “scenario”, “entity”, andsec_cats__<name>
will raise a ValueError.- coords_value_mappingdict, optional
A dict with primap2 dimension names as keys. Values are dicts with input values as keys and output values as values. A standard use case is to map gas names from input data to the standardized names used in primap2. Alternatively a value can also be a function which transforms one CSV metadata value into the new metadata value. A third possibility is to give a string as a value, which defines a rule for translating metadata values. For the “category”, “entity”, and “unit” columns, the rule “PRIMAP1” is available, which translates from PRIMAP1 metadata to PRIMAP2 metadata.
- coords_value_fillingdict, optional
A dict with primap2 dimension names as keys. These are the target columns where values will be filled (or replaced). Vales are dicts with primap2 dimension names as keys. These are the source columns. The values are dicts with source value - target value mappings. The value filling can do everything that the value mapping can, but while mapping can only replace values within a column using information from that column, the filing function can also fill or replace data based on values from a different column. This can be used to e.g. fill missing category codes based on category names or to replace category codes which do not meet the terminology using the category names.
- filter_keepdict, optional
Dict defining filters of data to keep. Filtering is done before metadata mapping, so use original metadata values to define the filter. Column names are as in the csv file. Each entry in the dict defines an individual filter. The names of the filters have no relevance. Default: keep all data.
- filter_removedict, optional
Dict defining filters of data to remove. Filtering is done before metadata mapping, so use original metadata values to define the filter. Column names are as in the csv file. Each entry in the dict defines an individual filter. The names of the filters have no relevance.
- meta_datadict, optional
Meta data for the whole dataset. Will end up in the dataset-wide attrs. Allowed keys are “references”, “rights”, “contact”, “title”, “comment”, “institution”, and “history”. Documentation about the format and meaning of the meta data can be found in the data format documentation.
- time_formatstr, optional (default: “%Y-%m-%d”)
strftime style format used to format the time information for the data columns in the interchange format. Default: “%F”, i.e. the ISO 8601 date format.
- convert_strbool or dict, optional (default: True)
If set to false, string values in the data columns will be kept. If set to true they will be converted to np.nan or 0 following default rules. If a dict is given mapping will be as given in the dict for values present in the dict and default as in parse_code for all other values
- copy_dfbool, optional (default: True)
If set to true, a copy of the input DataFrame is made to keep the input as is. This negatively impacts speed. If set to false the input DataFrame will be altered but performance will be better
- Returns:
- obj: pd.DataFrame
pandas DataFrame with the read data
Examples
Example for meta_mapping:
meta_mapping = { 'pyCPA_col_1': {'col_1_value_1_in': 'col_1_value_1_out', 'col_1_value_2_in': 'col_1_value_2_out', }, 'pyCPA_col_2': {'col_2_value_1_in': 'col_2_value_1_out', 'col_2_value_2_in': 'col_2_value_2_out', }, }
Example for filter_keep:
filter_keep = { 'f_1': {'variable': ['CO2', 'CH4'], 'region': 'USA'}, 'f_2': {'variable': 'N2O'} }
This example filter keeps all CO2 and CH4 data for the USA and N2O data for all countries
Example for filter_remove:
filter_remove = { 'f_1': {'scenario': 'HISTORY'}, }
This filter removes all data with ‘HISTORY’ as scenario
- primap2.pm2io.convert_wide_dataframe_if(data_wide: DataFrame, *, coords_cols: dict[str, str], add_coords_cols: None | dict[str, list[str]] = None, coords_defaults: None | dict[str, Any] = None, coords_terminologies: dict[str, str], coords_value_mapping: None | dict[str, Any] = None, coords_value_filling: None | dict[str, dict[str, dict]] = None, filter_keep: None | dict[str, dict[str, Any]] = None, filter_remove: None | dict[str, dict[str, Any]] = None, meta_data: None | dict[str, Any] = None, time_format: str = '%Y', time_cols: None | list = None, convert_str: bool | dict[str, float] = True, copy_df: bool = False) DataFrame [source]¶
Convert a DataFrame in wide format into the PRIMAP2 interchange format.
Columns can be renamed or filled with default values to match the PRIMAP2 structure. Where we refer to “dimensions” in the parameter description below we mean the basic dimension names without the added terminology (e.g. “area” not “area (ISO3)”). The terminology information will be added by this function. You can not use the short dimension names in the attributes (e.g. “cat” instead of “category”).
TODO: Currently duplicate data points will not be detected.
TODO: enable filtering through query strings
TODO: enable specification of the entity terminology
- Parameters:
- data_wide: pd.DataFrame
Wide DataFrame which will be converted.
- coords_colsdict
Dict where the keys are PRIMAP2 dimension names and the values are column names in the dataframe to be converted. For secondary categories use a
sec_cats__
prefix.- add_coords_colsdict, optional
Dict where the keys are PRIMAP2 additional coordinate names and the values are lists with two elements where the first is the column in the dataframe to be converted and the second is the primap2 dimension for the coordinate (e.g.
category
for acategory_name
coordinate.- coords_defaultsdict, optional
Dict for default values of coordinates / dimensions not given in the dataframe. The keys are the dimension names and the values are the values for the dimensions. For secondary categories use a
sec_cats__
prefix.- coords_terminologiesdict
Dict defining the terminologies used for the different coordinates (e.g. ISO3 for area). Only possible coordinates here are: area, category, scenario, entity, and secondary categories. For secondary categories use a
sec_cats__
prefix. All entries different from “area”, “category”, “scenario”, “entity”, andsec_cats__<name>
will raise a ValueError.- coords_value_mappingdict, optional
A dict with primap2 dimension names as keys. Values are dicts with input values as keys and output values as values. A standard use case is to map gas names from input data to the standardized names used in primap2. Alternatively a value can also be a function which transforms one CSV metadata value into the new metadata value. A third possibility is to give a string as a value, which defines a rule for translating metadata values. The only defined rule at the moment is “PRIMAP1” which can be used for the “category”, “entity”, and “unit” columns to translate from PRIMAP1 metadata to PRIMAP2 metadata.
- coords_value_fillingdict, optional
A dict with primap2 dimension names as keys. These are the target columns where values will be filled (or replaced). Vales are dicts with primap2 dimension names as keys. These are the source columns. The values are dicts with source value - target value mappings. The value filling can do everything that the value mapping can, but while mapping can only replace values within a column using information from that column, the filing function can also fill or replace data based on values from a different column. This can be used to e.g. fill missing category codes based on category names or to replace category codes which do not meet the terminology using the category names.
- filter_keepdict, optional
Dict defining filters of data to keep. Filtering is done before metadata mapping, so use original metadata values to define the filter. Column names are as in the csv file. Each entry in the dict defines an individual filter. The names of the filters have no relevance. Default: keep all data.
- filter_removedict, optional
Dict defining filters of data to remove. Filtering is done before metadata mapping, so use original metadata values to define the filter. Column names are as in the csv file. Each entry in the dict defines an individual filter. The names of the filters have no relevance.
- meta_datadict, optional
Meta data for the whole dataset. Will end up in the dataset-wide attrs. Allowed keys are “references”, “rights”, “contact”, “title”, “comment”, “institution”, and “history”. Documentation about the format and meaning of the meta data can be found in the data format documentation.
- time_formatstr
str with strftime style format used to parse the time information for the data columns. Default: “%Y”, which will match years.
- time_colslist, optional
List of column names which contain the data for each time point. If not given cols will be inferred using time_format.
- convert_strbool or dict, optional (default: True)
If set to false, string values in the data columns will be kept. If set to true they will be converted to np.nan or 0 following default rules. If a dict is given mapping will be as given in the dict for values present in the dict and default as in parse_code for all other values
- copy_dfbool, optional (default: True)
If set to true, a copy of the input DataFrame is made to keep the input as is. This negatively impacts speed. If set to false the input DataFrame will be altered but performance will be better
- Returns:
- obj: pd.DataFrame
pandas DataFrame with the read data
Examples
Example for meta_mapping:
meta_mapping = { 'pyCPA_col_1': {'col_1_value_1_in': 'col_1_value_1_out', 'col_1_value_2_in': 'col_1_value_2_out', }, 'pyCPA_col_2': {'col_2_value_1_in': 'col_2_value_1_out', 'col_2_value_2_in': 'col_2_value_2_out', }, }
Example for filter_keep:
filter_keep = { 'f_1': {'variable': ['CO2', 'CH4'], 'region': 'USA'}, 'f_2': {'variable': 'N2O'} }
This example filter keeps all CO2 and CH4 data for the USA and N2O data for all countries
Example for filter_remove:
filter_remove = { 'f_1': {'scenario': 'HISTORY'}, }
This filter removes all data with ‘HISTORY’ as scenario
- primap2.pm2io.from_interchange_format(data: DataFrame, attrs: dict | None = None, max_array_size: int = 1073741824) Dataset [source]¶
Convert dataset from the interchange format to the standard PRIMAP2 format.
Converts an interchange format DataFrame with added metadata to a PRIMAP2 xarray data structure. All column names and attrs are expected to be already in PRIMAP2 format as defined for the interchange format. The attrs dict is given explicitly as the attrs functionality in pandas is experimental.
- Parameters:
- data: pd.DataFrame
pandas DataFrame in PRIMAP2 interchange format.
- attrs: dict, optional
attrs dict as defined for the PRIMAP2 interchange format. Default: use data.attrs.
- max_array_size: int, optional
Maximum permitted projected array size. Larger sizes will raise an exception. Default: 1 G, corresponding to about 4 GB of memory usage.
- Returns:
- obj: xr.Dataset
xr dataset with the converted data
- primap2.pm2io.nir_add_unit_information(df_nir: DataFrame, *, unit_row: str | int, entity_row: int | str | None = None, regexp_entity: str, regexp_unit: str | None = None, manual_repl_unit: dict[str, str] | None = None, manual_repl_entity: dict[str, str] | None = None, default_unit: str) DataFrame [source]¶
Add unit information to a National Inventory Report (NIR) style DataFrame.
Add unit information to the header of an “entity-wide” file as present in the standard table format of National Inventory Reports (NIRs). The unit and entity information is extracted from combined unit and entity information in the row defined by unit_row. The parameters regexp_unit and regexp_entity determines how this is done by regular expressions for unit and entity. Additionally, manual mappings can be defined in the manual_repl_unit and manual_repl_entity dicts. For each column the routine tries to extract a unit using the regular expression. If this fails it looks in the manual_repl_unit dict for unit and in manual_repl_entity for entity information. If there is no information the default unit given in default_unit is used. In this case the analyzed value is used as entity unchanged.
- Parameters:
- df_nirpd.DataFrame
Pandas DataFrame with the table to process
- unit_rowstr or int
String “header” to indicate that the column header should be used to derive the unit information or an integer specifying the row to use for unit information. If entity and unit information are given in the same row use only unit_row.
- entity_rowstr or int
String “header” to indicate that the column header should be used to derive the unit information or an integer specifying the row to use for entity information. If entity and unit information are given in the same row use only unit_row
- regexp_entitystr
regular expression that extracts the entity from the cell value
- regexp_unitstr (optional)
regular expression that extracts the unit from the cell value
- manual_repl_unitdict (optional)
dict defining unit for given cell values
- manual_repl_entitydict (optional)
dict defining entity for given cell values
- default_unitstr
unit to be used if no unit can be extracted an no unit is given
- Returns:
- pd.DataFrame
DataFrame with explicit unit information (as column header)
- primap2.pm2io.nir_convert_df_to_long(df_nir: DataFrame, year: int, header_long: list[str] | None = None) DataFrame [source]¶
Convert an entity-wide NIR table for a single year to a long format DataFrame.
The input DataFrame is required to have the following structure: * Columns for category, original category name, and data in this order, where category and original category name form a multiindex. * Column header as multiindex for entity and unit A column for the year is added during the conversion.
- Parameters:
- df_nir: pd.DataFrame
Pandas DataFrame with the NIR table to be converted
- year: int
Year of the given data
- header_long: list, optional
specify a non-standard column header, e.g. with only category code or orig_cat_name
- Returns:
- pd.DataFrame
converted DataFrame
- primap2.pm2io.read_interchange_format(filepath: str | Path) DataFrame [source]¶
Read a dataset in the interchange format from disk into memory.
Reads an interchange format dataset from disk. The data is stored in a csv file while the additional metadata is stored in a yaml file. This function takes the yaml file as parameter, the data file is specified in the yaml file. If no or a wrong ending is given the function tries to load a file by the same name with the ending .yaml.
- Parameters:
- filepath: str or pathlib.Path
path and filename for the dataset (the yaml file, not data file).
- Returns:
- data: pandas.DataFrame
DataFrame with the read data in PRIMAP2 interchange format
- primap2.pm2io.read_long_csv_file_if(filepath_or_buffer: str | Path | IO, *, coords_cols: dict[str, str], add_coords_cols: None | dict[str, list[str]] = None, coords_defaults: None | dict[str, Any] = None, coords_terminologies: dict[str, str], coords_value_mapping: None | dict[str, Any] = None, coords_value_filling: None | dict[str, dict[str, dict]] = None, filter_keep: None | dict[str, dict[str, Any]] = None, filter_remove: None | dict[str, dict[str, Any]] = None, meta_data: None | dict[str, Any] = None, time_format: str = '%Y-%m-%d', convert_str: bool | dict[str, float] = True) DataFrame [source]¶
Read a CSV file in long (tidy) format into the PRIMAP2 interchange format.
Columns can be renamed or filled with default values to match the PRIMAP2 structure. Where we refer to “dimensions” in the parameter description below we mean the basic dimension names without the added terminology (e.g. “area” not “area (ISO3)”). The terminology information will be added by this function. You can not use the short dimension names in the attributes (e.g. “cat” instead of “category”).
- Parameters:
- filepath_or_buffer: str, pathlib.Path, or file-like
Long CSV file which will be read.
- coords_colsdict
Dict where the keys are column names in the files to be read and the value is the dimension in PRIMAP2. To specify the data column containing the observable, use the “data” key. For secondary categories use a
sec_cats__
prefix.- add_coords_colsdict, optional
Dict where the keys are PRIMAP2 additional coordinate names and the values are lists with two elements where the first is the column in the csv file to be read and the second is the primap2 dimension for the coordinate (e.g.
category
for acategory_name
coordinate).- coords_defaultsdict, optional
Dict for default values of coordinates / dimensions not given in the csv files. The keys are the dimension names and the values are the values for the dimensions. For secondary categories use a
sec_cats__
prefix.- coords_terminologiesdict
Dict defining the terminologies used for the different coordinates (e.g. ISO3 for area). Only possible coordinates here are: area, category, scenario, entity, and secondary categories. For secondary categories use a
sec_cats__
prefix. All entries different from “area”, “category”, “scenario”, “entity”, andsec_cats__<name>
will raise a ValueError.- coords_value_mappingdict, optional
A dict with primap2 dimension names as keys. Values are dicts with input values as keys and output values as values. A standard use case is to map gas names from input data to the standardized names used in primap2. Alternatively a value can also be a function which transforms one CSV metadata value into the new metadata value. A third possibility is to give a string as a value, which defines a rule for translating metadata values. For the “category”, “entity”, and “unit” columns, the rule “PRIMAP1” is available, which translates from PRIMAP1 metadata to PRIMAP2 metadata.
- coords_value_fillingdict, optional
A dict with primap2 dimension names as keys. These are the target columns where values will be filled (or replaced). Vales are dicts with primap2 dimension names as keys. These are the source columns. The values are dicts with source value - target value mappings. The value filling can do everything that the value mapping can, but while mapping can only replace values within a column using information from that column, the filing function can also fill or replace data based on values from a different column. This can be used to e.g. fill missing category codes based on category names or to replace category codes which do not meet the terminology using the category names.
- filter_keepdict, optional
Dict defining filters of data to keep. Filtering is done before metadata mapping, so use original metadata values to define the filter. Column names are as in the csv file. Each entry in the dict defines an individual filter. The names of the filters have no relevance. Default: keep all data.
- filter_removedict, optional
Dict defining filters of data to remove. Filtering is done before metadata mapping, so use original metadata values to define the filter. Column names are as in the csv file. Each entry in the dict defines an individual filter. The names of the filters have no relevance.
- meta_datadict, optional
Meta data for the whole dataset. Will end up in the dataset-wide attrs. Allowed keys are “references”, “rights”, “contact”, “title”, “comment”, “institution”, and “history”. Documentation about the format and meaning of the meta data can be found in the data format documentation.
- time_formatstr, optional
strftime style format used to format the time information for the data columns in the interchange format. Default: “%F”, i.e. the ISO 8601 date format.
- convert_strbool or dict, optional (default: True)
If set to false, string values in the data columns will be kept. If set to true they will be converted to np.nan or 0 following default rules. If a dict is given mapping will be as given in the dict for values present in the dict and default as in parse_code for all other values
- Returns:
- obj: pd.DataFrame
pandas DataFrame with the read data
Examples
Example for meta_mapping:
meta_mapping = { 'pyCPA_col_1': {'col_1_value_1_in': 'col_1_value_1_out', 'col_1_value_2_in': 'col_1_value_2_out', }, 'pyCPA_col_2': {'col_2_value_1_in': 'col_2_value_1_out', 'col_2_value_2_in': 'col_2_value_2_out', }, }
Example for filter_keep:
filter_keep = { 'f_1': {'variable': ['CO2', 'CH4'], 'region': 'USA'}, 'f_2': {'variable': 'N2O'} }
This example filter keeps all CO2 and CH4 data for the USA and N2O data for all countries
Example for filter_remove:
filter_remove = { 'f_1': {'scenario': 'HISTORY'}, }
This filter removes all data with ‘HISTORY’ as scenario
- primap2.pm2io.read_wide_csv_file_if(filepath_or_buffer: str | Path | IO, *, coords_cols: dict[str, str], add_coords_cols: None | dict[str, list[str]] = None, coords_defaults: None | dict[str, Any] = None, coords_terminologies: dict[str, str], coords_value_mapping: None | dict[str, Any] = None, coords_value_filling: None | dict[str, dict[str, dict]] = None, filter_keep: None | dict[str, dict[str, Any]] = None, filter_remove: None | dict[str, dict[str, Any]] = None, meta_data: None | dict[str, Any] = None, time_format: str = '%Y', convert_str: bool | dict[str, float] = True) DataFrame [source]¶
Read a CSV file in wide format into the PRIMAP2 interchange format.
Columns can be renamed or filled with default values to match the PRIMAP2 structure. Where we refer to “dimensions” in the parameter description below we mean the basic dimension names without the added terminology (e.g. “area” not “area (ISO3)”). The terminology information will be added by this function. You can not use the short dimension names in the attributes (e.g. “cat” instead of “category”).
TODO: Currently duplicate data points will not be detected.
TODO: enable filtering through query strings
TODO: enable specification of the entity terminology
- Parameters:
- filepath_or_buffer: str, pathlib.Path, or file-like
Wide CSV file which will be read.
- coords_colsdict
Dict where the keys are PRIMAP2 dimensions and the values are column names in the files to be read. For secondary categories use a
sec_cats__
prefix.- add_coords_colsdict, optional
Dict where the keys are PRIMAP2 additional coordinate names and the values are lists with two elements where the first is the column in the csv file to be read and the second is the primap2 dimension for the coordinate (e.g.
category
for acategory_name
coordinate.- coords_defaultsdict, optional
Dict for default values of coordinates / dimensions not given in the csv files. The keys are the dimension names and the values are the values for the dimensions. For secondary categories use a
sec_cats__
prefix.- coords_terminologiesdict
Dict defining the terminologies used for the different coordinates (e.g. ISO3 for area). Only possible coordinates here are: area, category, scenario, entity, and secondary categories. For secondary categories use a
sec_cats__
prefix. All entries different from “area”, “category”, “scenario”, “entity”, andsec_cats__<name>
will raise a ValueError.- coords_value_mappingdict, optional
A dict with primap2 dimension names as keys. Values are dicts with input values as keys and output values as values. A standard use case is to map gas names from input data to the standardized names used in primap2. Alternatively a value can also be a function which transforms one CSV metadata value into the new metadata value. A third possibility is to give a string as a value, which defines a rule for translating metadata values. The only defined rule at the moment is “PRIMAP1” which can be used for the “category”, “entity”, and “unit” columns to translate from PRIMAP1 metadata to PRIMAP2 metadata.
- coords_value_fillingdict, optional
A dict with primap2 dimension names as keys. These are the target columns where values will be filled (or replaced). Vales are dicts with primap2 dimension names as keys. These are the source columns. The values are dicts with source value - target value mappings. The value filling can do everything that the value mapping can, but while mapping can only replace values within a column using information from that column, the filing function can also fill or replace data based on values from a different column. This can be used to e.g. fill missing category codes based on category names or to replace category codes which do not meet the terminology using the category names.
- filter_keepdict, optional
Dict defining filters of data to keep. Filtering is done before metadata mapping, so use original metadata values to define the filter. Column names are as in the csv file. Each entry in the dict defines an individual filter. The names of the filters have no relevance. Default: keep all data.
- filter_removedict, optional
Dict defining filters of data to remove. Filtering is done before metadata mapping, so use original metadata values to define the filter. Column names are as in the csv file. Each entry in the dict defines an individual filter. The names of the filters have no relevance.
- meta_datadict, optional
Meta data for the whole dataset. Will end up in the dataset-wide attrs. Allowed keys are “references”, “rights”, “contact”, “title”, “comment”, “institution”, and “history”. Documentation about the format and meaning of the meta data can be found in the data format documentation.
- time_formatstr, optional
strftime style format used to parse the time information for the data columns. Default: “%Y”, which will match years.
- convert_strbool or dict, optional (default: True)
If set to false, string values in the data columns will be kept. If set to true they will be converted to np.nan or 0 following default rules. If a dict is given mapping will be as given in the dict for values present in the dict and default as in parse_code for all other values
- Returns:
- obj: pd.DataFrame
pandas DataFrame with the read data
Examples
Example for meta_mapping:
meta_mapping = { 'pyCPA_col_1': {'col_1_value_1_in': 'col_1_value_1_out', 'col_1_value_2_in': 'col_1_value_2_out', }, 'pyCPA_col_2': {'col_2_value_1_in': 'col_2_value_1_out', 'col_2_value_2_in': 'col_2_value_2_out', }, }
Example for filter_keep:
filter_keep = { 'f_1': {'variable': ['CO2', 'CH4'], 'region': 'USA'}, 'f_2': {'variable': 'N2O'} }
This example filter keeps all CO2 and CH4 data for the USA and N2O data for all countries
Example for filter_remove:
filter_remove = { 'f_1': {'scenario': 'HISTORY'}, }
This filter removes all data with ‘HISTORY’ as scenario
- primap2.pm2io.write_interchange_format(filepath: str | Path, data: DataFrame, attrs: dict | None = None) None [source]¶
Write dataset in interchange format to disk.
Writes an interchange format dataset consisting of a pandas Dataframe and an additional meta data dict to disk. The data is stored in a csv file while the additional metadata is written to a yaml file.
- Parameters:
- filepath: str or pathlib.Path
path and filename stem for the dataset. If a file ending is given it will be ignored and replaced by .csv for the data and .yaml for the metadata
- data: pandas.DataFrame
DataFrame in PRIMAP2 interchange format
- attrs: dict, optional
Interchange format meta data dict. Default: use data.attrs .