primap2.pm2io.nir_add_unit_information

primap2.pm2io.nir_add_unit_information#

primap2.pm2io.nir_add_unit_information(df_nir: DataFrame, *, unit_row: str | int, entity_row: str | int | None = None, regexp_entity: str, regexp_unit: str | None = None, manual_repl_unit: dict[str, str] | None = None, manual_repl_entity: dict[str, str] | None = None, default_unit: str) DataFrame[source]#

Add unit information to a National Inventory Report (NIR) style DataFrame.

Add unit information to the header of an “entity-wide” file as present in the standard table format of National Inventory Reports (NIRs). The unit and entity information is extracted from combined unit and entity information in the row defined by unit_row. The parameters regexp_unit and regexp_entity determines how this is done by regular expressions for unit and entity. Additionally, manual mappings can be defined in the manual_repl_unit and manual_repl_entity dicts. For each column the routine tries to extract a unit using the regular expression. If this fails it looks in the manual_repl_unit dict for unit and in manual_repl_entity for entity information. If there is no information the default unit given in default_unit is used. In this case the analyzed value is used as entity unchanged.

Parameters:
df_nirpd.DataFrame

Pandas DataFrame with the table to process

unit_rowstr or int

String “header” to indicate that the column header should be used to derive the unit information or an integer specifying the row to use for unit information. If entity and unit information are given in the same row use only unit_row.

entity_rowstr or int

String “header” to indicate that the column header should be used to derive the unit information or an integer specifying the row to use for entity information. If entity and unit information are given in the same row use only unit_row

regexp_entitystr

regular expression that extracts the entity from the cell value

regexp_unitstr (optional)

regular expression that extracts the unit from the cell value

manual_repl_unitdict (optional)

dict defining unit for given cell values

manual_repl_entitydict (optional)

dict defining entity for given cell values

default_unitstr

unit to be used if no unit can be extracted an no unit is given

Returns:
pd.DataFrame

DataFrame with explicit unit information (as column header)