xarray.Dataset.pr.set

Dataset.pr.set(dim: Hashable, key: Any, value: Dataset, *, existing: str = 'fillna_empty', new: str = 'extend') Dataset

Set values, optionally expanding the given dimension as necessary.

All data variables which have the given dimension are modified. The affected data variables are mutated using DataArray.pr.set(dim, key, value[name], existing=existing, new=new).

Parameters:
dim: str

Dimension along which values should be set. Only data variables which have this dimension are mutated.

key: scalar or list of scalars

Keys in the dimension which should be set. Key values which are missing in the dimension are inserted. The handling of key values which already exist in the dimension is determined by the existing parameter.

value: xr.Dataset

Values that will be inserted at the positions specified by key. value needs to contain all data variables which have the dimension. value has to be broadcastable to ds.pr.loc[{dim: key}].

existing: “fillna_empty”, “error”, “overwrite”, or “fillna”, optional

How to handle existing keys. If existing="fillna_empty" (default), new values overwrite existing values only if all existing values are NaN. If existing="error", a ValueError is raised if any key already exists in the index. If existing="overwrite", new values overwrite current values for existing keys. If existing="fillna", the new values only overwrite NaN values for existing keys.

new: “extend”, or “error”, optional

How to handle new keys. If new="extend" (default), keys which do not exist so far are automatically inserted by extending the dimension. If new="error", a KeyError is raised if any key is not yet in the dimension.

Returns:
dsxr.Dataset

modified Dataset

Examples

>>> import pandas as pd
>>> import xarray as xr
>>> import numpy as np
>>> area = ("area (ISO3)", ["COL", "MEX"])
>>> time = ("time", pd.date_range("2000", "2003", freq="AS"))
>>> ds = xr.Dataset(
...     {
...         "CO2": xr.DataArray(
...             [[0.0, 1.0, 2.0, 3.0], [2.0, 3.0, 4.0, 5.0]],
...             coords=[area, time],
...         ),
...         "SF4": xr.DataArray(
...             [[0.5, 1.5, 2.5, 3.5], [2.5, 3.5, np.nan, 5.5]],
...             coords=[area, time],
...         ),
...     },
...     attrs={"area": "area (ISO3)"},
... )
>>> ds
<xarray.Dataset>
Dimensions:      (area (ISO3): 2, time: 4)
Coordinates:
  * area (ISO3)  (area (ISO3)) <U3 'COL' 'MEX'
  * time         (time) datetime64[ns] 2000-01-01 2001-01-01 ... 2003-01-01
Data variables:
    CO2          (area (ISO3), time) float64 0.0 1.0 2.0 3.0 2.0 3.0 4.0 5.0
    SF4          (area (ISO3), time) float64 0.5 1.5 2.5 3.5 2.5 3.5 nan 5.5
Attributes:
    area:     area (ISO3)

Setting an existing value

>>> ds.pr.set("area", "MEX", ds.pr.loc[{"area": "COL"}] * 20)
Traceback (most recent call last):
...
ValueError: Values {'MEX'} for 'area (ISO3)' already exist and contain data. ...
>>> ds.pr.set(
...     "area", "MEX", ds.pr.loc[{"area": "COL"}] * 20, existing="overwrite"
... )
<xarray.Dataset>
Dimensions:      (area (ISO3): 2, time: 4)
Coordinates:
  * area (ISO3)  (area (ISO3)) object 'COL' 'MEX'
  * time         (time) datetime64[ns] 2000-01-01 2001-01-01 ... 2003-01-01
Data variables:
    CO2          (area (ISO3), time) float64 0.0 1.0 2.0 3.0 0.0 20.0 40.0 60.0
    SF4          (area (ISO3), time) float64 0.5 1.5 2.5 3.5 10.0 30.0 50.0 70.0
Attributes:
    area:     area (ISO3)

Instead of overwriting existing values, you can also choose to only fill missing values

>>> ds.pr.set("area", "MEX", ds.pr.loc[{"area": "COL"}] * 20, existing="fillna")
<xarray.Dataset>
Dimensions:      (area (ISO3): 2, time: 4)
Coordinates:
  * area (ISO3)  (area (ISO3)) object 'COL' 'MEX'
  * time         (time) datetime64[ns] 2000-01-01 2001-01-01 ... 2003-01-01
Data variables:
    CO2          (area (ISO3), time) float64 0.0 1.0 2.0 3.0 2.0 3.0 4.0 5.0
    SF4          (area (ISO3), time) float64 0.5 1.5 2.5 3.5 2.5 3.5 50.0 5.5
Attributes:
    area:     area (ISO3)

By default, existing values are only filled if all existing values are missing in all data variables

>>> ds_partly_empty = ds.copy(deep=True)
>>> ds_partly_empty["CO2"].pr.loc[{"area": "COL"}] = np.nan
>>> ds_partly_empty["SF4"].pr.loc[{"area": "COL"}] = np.nan
>>> ds_partly_empty
<xarray.Dataset>
Dimensions:      (area (ISO3): 2, time: 4)
Coordinates:
  * area (ISO3)  (area (ISO3)) <U3 'COL' 'MEX'
  * time         (time) datetime64[ns] 2000-01-01 2001-01-01 ... 2003-01-01
Data variables:
    CO2          (area (ISO3), time) float64 nan nan nan nan 2.0 3.0 4.0 5.0
    SF4          (area (ISO3), time) float64 nan nan nan nan 2.5 3.5 nan 5.5
Attributes:
    area:     area (ISO3)
>>> ds_partly_empty.pr.set(
...     "area", "COL", ds_partly_empty.pr.loc[{"area": "MEX"}] * 10
... )
<xarray.Dataset>
Dimensions:      (area (ISO3): 2, time: 4)
Coordinates:
  * area (ISO3)  (area (ISO3)) object 'COL' 'MEX'
  * time         (time) datetime64[ns] 2000-01-01 2001-01-01 ... 2003-01-01
Data variables:
    CO2          (area (ISO3), time) float64 20.0 30.0 40.0 50.0 2.0 3.0 4.0 5.0
    SF4          (area (ISO3), time) float64 25.0 35.0 nan 55.0 2.5 3.5 nan 5.5
Attributes:
    area:     area (ISO3)
>>> # if even one value is non-nan, this fails by default
>>> ds_partly_empty["SF4"].pr.loc[{"area": "COL", "time": "2001"}] = 2
>>> ds_partly_empty.pr.set(
...     "area", "COL", ds_partly_empty.pr.loc[{"area": "MEX"}] * 10
... )
Traceback (most recent call last):
...
ValueError: Values {'COL'} for 'area (ISO3)' already exist and contain data. ...

Introducing a new value uses the same syntax

>>> ds.pr.set("area", "BOL", ds.pr.loc[{"area": "COL"}] * 20)
<xarray.Dataset>
Dimensions:      (area (ISO3): 3, time: 4)
Coordinates:
  * area (ISO3)  (area (ISO3)) object 'BOL' 'COL' 'MEX'
  * time         (time) datetime64[ns] 2000-01-01 2001-01-01 ... 2003-01-01
Data variables:
    CO2          (area (ISO3), time) float64 0.0 20.0 40.0 60.0 ... 3.0 4.0 5.0
    SF4          (area (ISO3), time) float64 10.0 30.0 50.0 70.0 ... 3.5 nan 5.5
Attributes:
    area:     area (ISO3)

If you don’t want to automatically extend the dimensions with new values, you can request checking that all keys already exist using new="error":

>>> ds.pr.set("area", "BOL", ds.pr.loc[{"area": "COL"}] * 20, new="error")
Traceback (most recent call last):
...
KeyError: "Values {'BOL'} not in 'area (ISO3)', use new='extend' to automatic...

Note that data variables which do not contain the specified dimension are ignored

>>> ds["population"] = xr.DataArray([1e6, 1.2e6, 1.3e6, 1.4e6], coords=(time,))
>>> ds
<xarray.Dataset>
Dimensions:      (area (ISO3): 2, time: 4)
Coordinates:
  * area (ISO3)  (area (ISO3)) <U3 'COL' 'MEX'
  * time         (time) datetime64[ns] 2000-01-01 2001-01-01 ... 2003-01-01
Data variables:
    CO2          (area (ISO3), time) float64 0.0 1.0 2.0 3.0 2.0 3.0 4.0 5.0
    SF4          (area (ISO3), time) float64 0.5 1.5 2.5 3.5 2.5 3.5 nan 5.5
    population   (time) float64 1e+06 1.2e+06 1.3e+06 1.4e+06
Attributes:
    area:     area (ISO3)
>>> ds.pr.set("area", "BOL", ds.pr.loc[{"area": "COL"}] * 20)
<xarray.Dataset>
Dimensions:      (area (ISO3): 3, time: 4)
Coordinates:
  * area (ISO3)  (area (ISO3)) object 'BOL' 'COL' 'MEX'
  * time         (time) datetime64[ns] 2000-01-01 2001-01-01 ... 2003-01-01
Data variables:
    CO2          (area (ISO3), time) float64 0.0 20.0 40.0 60.0 ... 3.0 4.0 5.0
    SF4          (area (ISO3), time) float64 10.0 30.0 50.0 70.0 ... 3.5 nan 5.5
    population   (time) float64 1e+06 1.2e+06 1.3e+06 1.4e+06
Attributes:
    area:     area (ISO3)