primap2.csg.create_composite_source

primap2.csg.create_composite_source#

primap2.csg.create_composite_source(input_ds: ~xarray.core.dataset.Dataset, priority_definition: ~primap2.csg._models.PriorityDefinition, strategy_definition: ~primap2.csg._models.StrategyDefinition, result_prio_coords: dict[str, dict[str, str]], limit_coords: dict[str, str | list[str]] | None = None, time_range: tuple[str | ~numpy.datetime64, str | ~numpy.datetime64] | ~pandas.DatetimeIndex | None = None, metadata: dict[str, str] | None = None, progress_bar: type[~tqdm.std.tqdm] | None = <class 'tqdm.std.tqdm'>) Dataset[source]#

Create a composite data source

This is a wrapper around primap2.csg.compose that prepares the input data and sets result values for the priority coordinates.

Parameters:
input_ds

Dataset containing all input data

priority_definition

Defines the priorities to select timeseries from the input data. Priorities are formed by a list of selections and are used “from left to right”, where the first matching selection has the highest priority. Each selection has to specify values for all priority dimensions (so that exactly one timeseries is selected from the input data), but can also specify other dimensions. That way it is, e.g., possible to define a different priority for a specific country by listing it early (i.e. with high priority) before the more general rules which should be applied for all other countries. You can also specify the “entity” or “variable” in the selection, which will limit the rule to a specific entity or variable, respectively. For each DataArray in the input_data Dataset, the variable is its name, the entity is the value of the key entity in its attrs.

strategy_definition

Defines the filling strategies to be used when filling timeseries with other timeseries. Again, the priority is defined by a list of selections and corresponding strategies which are used “from left to right”. Selections can use any dimension and don’t have to apply to only one timeseries. For example, to define a default strategy which should be used for all timeseries unless something else is configured, configure an empty selection as the last (rightmost) entry. You can also specify the “entity” or “variable” in the selection, which will limit the rule to a specific entity or variable, respectively. For each DataArray in the input_data Dataset, the variable is its name, the entity is the value of the key entity in its attrs.

result_prio_coords

Defines the vales for the priority coordinates in the output dataset. As the priority coordinates differ for all input sources there is no canonical value for the result and it has to be explicitly defined.

limit_coords

Optional parameter to remove data for coordinate values not needed for the composition from the input data. The time coordinate is treated separately.

time_range

Optional parameter to limit the time coverage of the input data. Can either be a pandas DatetimeIndex or a tuple of str or np.datetime64 in the form (year_from, year_to) where both boundaries are included in the range. Only the overlap of the supplied index or index created from the tuple with the time coordinate of the input dataset will be used.

metadata

Set metadata values such as title and references.

progress_bar

By default, show progress bars using the tqdm package during the operation. If None, don’t show any progress bars. You can supply a class compatible to tqdm.tqdm’s protocol if you want to customize the progress bar.

Returns:
xr.Dataset with composed data according to the given priority and strategy
definitions