Dealing with missing information#
Aggregation#
xarray provides robust functions for aggregation (xarray.DataArray.sum()
).
PRIMAP2 adds functions which skip missing data points if the
information is missing at all points along certain axes, for example for
a whole time series.
Let’s first create an example with missing information:
import pandas as pd
import numpy as np
import xarray as xr
import primap2
time = pd.date_range("2000-01-01", "2003-01-01", freq="YS")
area_iso3 = np.array(["COL", "ARG", "MEX"])
coords = [("area (ISO3)", area_iso3), ("time", time)]
da = xr.DataArray(
data=[
[1, 2, 3, 4],
[np.nan, np.nan, np.nan, np.nan],
[1, 2, 3, np.nan],
],
coords=coords,
name="test data"
)
da.pr.to_df()
time | 2000-01-01 | 2001-01-01 | 2002-01-01 | 2003-01-01 |
---|---|---|---|---|
area (ISO3) | ||||
COL | 1.0 | 2.0 | 3.0 | 4.0 |
ARG | NaN | NaN | NaN | NaN |
MEX | 1.0 | 2.0 | 3.0 | NaN |
Now, we can use the primap2 xarray.DataArray.pr.sum()
function to evaluate the sum of countries
while ignoring only those countries where the whole timeseries is missing, using the
skipna_evaluation_dims
parameter:
da.pr.sum(dim="area", skipna_evaluation_dims="time").pr.to_df()
time
2000-01-01 2.0
2001-01-01 4.0
2002-01-01 6.0
2003-01-01 NaN
Freq: YS-JAN, Name: test data, dtype: float64
If you instead want to skip all NA values, use the skipna
parameter:
da.pr.sum(dim="area", skipna=True).pr.to_df()
time
2000-01-01 2.0
2001-01-01 4.0
2002-01-01 6.0
2003-01-01 4.0
Freq: YS-JAN, Name: test data, dtype: float64
# compare this to the result of the standard xarray sum - it also skips NA values by default:
da.sum(dim="area (ISO3)").pr.to_df()
time
2000-01-01 2.0
2001-01-01 4.0
2002-01-01 6.0
2003-01-01 4.0
Freq: YS-JAN, Name: test data, dtype: float64
infilling#
The same functionality is available for filling in missing information using the
xarray.DataArray.pr.fill_all_na()
function.
In this example, we fill missing information only where the whole time series is missing.
da.pr.fill_all_na("time", value=10).pr.to_df()
time | 2000-01-01 | 2001-01-01 | 2002-01-01 | 2003-01-01 |
---|---|---|---|---|
area (ISO3) | ||||
COL | 1.0 | 2.0 | 3.0 | 4.0 |
ARG | 10.0 | 10.0 | 10.0 | 10.0 |
MEX | 1.0 | 2.0 | 3.0 | NaN |