xarray.DataArray.pr.set

DataArray.pr.set(dim: Hashable, key: Any, value: DataArray | ndarray, *, value_dims: list[Hashable] | None = None, existing: str = 'fillna_empty', new: str = 'extend') DataArray

Set values, optionally expanding the given dimension as necessary.

The handling of already existing key values can be selected using the existing parameter.

Parameters:
dim: str

Dimension along which values should be set.

key: scalar or list of scalars

Keys in the dimension which should be set. Key values which are missing in the dimension are inserted. The handling of key values which already exist in the dimension is determined by the existing parameter.

value: xr.DataArray or np.ndarray

Values that will be inserted at the positions specified by key. value needs to be broadcastable to da[{dim: key}].

value_dims: list of str, optional

Specifies the dimensions of value. If value is not a DataArray and da[{dim: key}] is higher-dimensional, it is necessary to specify the value dimensions.

existing: “fillna_empty”, “error”, “overwrite”, or “fillna”, optional

How to handle existing keys. If existing="fillna_empty" (default), new values overwrite existing values only if all existing values are NaN. If existing="error", a ValueError is raised if any key already exists in the index. If existing="overwrite", new values overwrite current values for existing keys. If existing="fillna", the new values only overwrite NaN values for existing keys.

new: “extend”, or “error”, optional

How to handle new keys. If new="extend" (default), keys which do not exist so far are automatically inserted by extending the dimension. If new="error", a KeyError is raised if any key is not yet in the dimension.

Returns:
daxr.DataArray

modified DataArray

Examples

>>> import pandas as pd
>>> import xarray as xr
>>> import numpy as np
>>> da = xr.DataArray(
...     [[0.0, 1.0, 2.0, 3.0], [2.0, 3.0, 4.0, 5.0]],
...     coords=[
...         ("area (ISO3)", ["COL", "MEX"]),
...         ("time", pd.date_range("2000", "2003", freq="AS")),
...     ],
... )
>>> da
<xarray.DataArray (area (ISO3): 2, time: 4)>
array([[0., 1., 2., 3.],
       [2., 3., 4., 5.]])
Coordinates:
  * area (ISO3)  (area (ISO3)) <U3 'COL' 'MEX'
  * time         (time) datetime64[ns] 2000-01-01 2001-01-01 ... 2003-01-01

Setting an existing value

>>> da.pr.set("area", "COL", np.array([0.5, 0.6, 0.7, 0.8]))
Traceback (most recent call last):
...
ValueError: Values {'COL'} for 'area (ISO3)' already exist and contain data. ...
>>> da.pr.set(
...     "area", "COL", np.array([0.5, 0.6, 0.7, 0.8]), existing="overwrite"
... )
<xarray.DataArray (area (ISO3): 2, time: 4)>
array([[0.5, 0.6, 0.7, 0.8],
       [2. , 3. , 4. , 5. ]])
Coordinates:
  * time         (time) datetime64[ns] 2000-01-01 2001-01-01 ... 2003-01-01
  * area (ISO3)  (area (ISO3)) <U3 'COL' 'MEX'

By default, existing values are only overwritten if all existing values are NaN

>>> da_partly_empty = da.copy(deep=True)
>>> da_partly_empty.pr.loc[{"area": "COL"}] = np.nan
>>> da_partly_empty
<xarray.DataArray (area (ISO3): 2, time: 4)>
array([[nan, nan, nan, nan],
       [ 2.,  3.,  4.,  5.]])
Coordinates:
  * area (ISO3)  (area (ISO3)) <U3 'COL' 'MEX'
  * time         (time) datetime64[ns] 2000-01-01 2001-01-01 ... 2003-01-01
>>> da_partly_empty.pr.set("area", "COL", np.array([0.5, 0.6, 0.7, 0.8]))
<xarray.DataArray (area (ISO3): 2, time: 4)>
array([[0.5, 0.6, 0.7, 0.8],
       [2. , 3. , 4. , 5. ]])
Coordinates:
  * area (ISO3)  (area (ISO3)) <U3 'COL' 'MEX'
  * time         (time) datetime64[ns] 2000-01-01 2001-01-01 ... 2003-01-01
>>> # if even one value contains data, the default is to raise an Error
>>> da_partly_empty.pr.loc[{"area": "COL", "time": "2001"}] = 0.6
>>> da_partly_empty
<xarray.DataArray (area (ISO3): 2, time: 4)>
array([[nan, 0.6, nan, nan],
       [2. , 3. , 4. , 5. ]])
Coordinates:
  * area (ISO3)  (area (ISO3)) <U3 'COL' 'MEX'
  * time         (time) datetime64[ns] 2000-01-01 2001-01-01 ... 2003-01-01
>>> da_partly_empty.pr.set("area", "COL", np.array([0.5, 0.6, 0.7, 0.8]))
Traceback (most recent call last):
...
ValueError: Values {'COL'} for 'area (ISO3)' already exist and contain data. ...

Introducing a new value uses the same syntax as modifying existing values

>>> da.pr.set("area", "ARG", np.array([0.5, 0.6, 0.7, 0.8]))
<xarray.DataArray (area (ISO3): 3, time: 4)>
array([[0.5, 0.6, 0.7, 0.8],
       [0. , 1. , 2. , 3. ],
       [2. , 3. , 4. , 5. ]])
Coordinates:
  * area (ISO3)  (area (ISO3)) <U3 'ARG' 'COL' 'MEX'
  * time         (time) datetime64[ns] 2000-01-01 2001-01-01 ... 2003-01-01

You can also mix existing and new values

>>> da.pr.set(
...     "area",
...     ["COL", "ARG"],
...     np.array([[0.5, 0.6, 0.7, 0.8], [5, 6, 7, 8]]),
...     existing="overwrite",
... )
<xarray.DataArray (area (ISO3): 3, time: 4)>
array([[5. , 6. , 7. , 8. ],
       [0.5, 0.6, 0.7, 0.8],
       [2. , 3. , 4. , 5. ]])
Coordinates:
  * area (ISO3)  (area (ISO3)) <U3 'ARG' 'COL' 'MEX'
  * time         (time) datetime64[ns] 2000-01-01 2001-01-01 ... 2003-01-01

If you don’t want to automatically extend the dimensions with new values, you can request checking that all keys already exist using new="error":

>>> da.pr.set("area", "ARG", np.array([0.5, 0.6, 0.7, 0.8]), new="error")
Traceback (most recent call last):
...
KeyError: "Values {'ARG'} not in 'area (ISO3)', use new='extend' to automatic...

If you want to use broadcasting or have more dimensions, the dimensions of your input can’t be determined automatically anymore. Use the value_dims parameter to supply this information.

>>> da.pr.set(
...     "area",
...     ["COL", "ARG"],
...     np.array([0.5, 0.6, 0.7, 0.8]),
...     existing="overwrite",
... )
Traceback (most recent call last):
...
ValueError: Could not automatically determine value dimensions, please use th...
>>> da.pr.set(
...     "area",
...     ["COL", "ARG"],
...     np.array([0.5, 0.6, 0.7, 0.8]),
...     value_dims=["time"],
...     existing="overwrite",
... )
<xarray.DataArray (area (ISO3): 3, time: 4)>
array([[0.5, 0.6, 0.7, 0.8],
       [0.5, 0.6, 0.7, 0.8],
       [2. , 3. , 4. , 5. ]])
Coordinates:
  * time         (time) datetime64[ns] 2000-01-01 2001-01-01 ... 2003-01-01
  * area (ISO3)  (area (ISO3)) <U3 'ARG' 'COL' 'MEX'

Instead of overwriting existing values, you can also choose to only fill missing values.

>>> da.pr.loc[{"area": "COL", "time": "2001"}] = np.nan
>>> da
<xarray.DataArray (area (ISO3): 2, time: 4)>
array([[ 0., nan,  2.,  3.],
       [ 2.,  3.,  4.,  5.]])
Coordinates:
  * area (ISO3)  (area (ISO3)) <U3 'COL' 'MEX'
  * time         (time) datetime64[ns] 2000-01-01 2001-01-01 ... 2003-01-01
>>> da.pr.set(
...     "area",
...     ["COL", "ARG"],
...     np.array([0.5, 0.6, 0.7, 0.8]),
...     value_dims=["time"],
...     existing="fillna",
... )
<xarray.DataArray (area (ISO3): 3, time: 4)>
array([[0.5, 0.6, 0.7, 0.8],
       [0. , 0.6, 2. , 3. ],
       [2. , 3. , 4. , 5. ]])
Coordinates:
  * area (ISO3)  (area (ISO3)) <U3 'ARG' 'COL' 'MEX'
  * time         (time) datetime64[ns] 2000-01-01 2001-01-01 ... 2003-01-01

Because you can also supply a DataArray as a value, it is easy to define values from existing values using arithmetic

>>> da.pr.set("area", "ARG", da.pr.loc[{"area": "COL"}] * 2)
<xarray.DataArray (area (ISO3): 3, time: 4)>
array([[ 0., nan,  4.,  6.],
       [ 0., nan,  2.,  3.],
       [ 2.,  3.,  4.,  5.]])
Coordinates:
  * area (ISO3)  (area (ISO3)) object 'ARG' 'COL' 'MEX'
  * time         (time) datetime64[ns] 2000-01-01 2001-01-01 ... 2003-01-01