support runoff normalization for partial years with annual normalization data by fneum · Pull Request #434 · PyPSA/atlite

fneum · 2025-03-27T07:07:57Z

Changes proposed in this Pull Request

It makes cutout.runoff(normalize_using_yearly=...) more lenient by supporting proportional scaling of partially represented years in the cutout, assuming even distribution of runoff throughout the year. In this case, a warning is printed due to the strong assumption.

Relevant for PyPSA/pypsa-eur#1613 (e.g. July-June meteorological years, in case the original cutout does not span Jan-Dec for each year included)

This also means we can include hydro in the CI test of PyPSA-Eur, which uses only a week of weather data.

Checklist

Code changes are sufficiently documented; i.e. new functions contain docstrings and further explanations may be given in doc.
~~[] Unit tests for new features were added (if applicable).~~
~~[] Newly introduced dependencies are added to environment.yaml, environment_docs.yaml and setup.py (if applicable).~~
A note for the release notes doc/release_notes.rst of the upcoming release is included.
I consent to the release of this PR's code under the MIT license.

…ion data

Copilot

Pull Request Overview

This PR enhances the runoff normalization function to support partial years by applying proportional scaling based on available data, while issuing a warning about the strong assumption of even runoff distribution.

Improved handling of date ranges for runoff and normalization data
Added partial year normalization logic with proportional scaling and a warning message
Updated reindexing to use a dynamic coordinate based on the altered dimension

Files not reviewed (1)

RELEASE_NOTES.rst: Language not supported

Comments suppressed due to low confidence (3)

atlite/convert.py:1026

[nitpick] Ensure that reindexing using the dynamic dimension (dim) aligns with the intended coordinate, especially if the original 'countries' coordinate was explicitly required in other parts of the code.

).reindex({dim: result.coords[dim]})

atlite/convert.py:1010

[nitpick] Verify that the proportional scaling logic correctly handles all edge cases, such as when the overlap period almost covers a full year, to ensure the computed fraction accurately reflects the intended partial data scaling.

fraction = (end - start).total_seconds() / ((year_end - year_start).total_seconds())

atlite/convert.py:965

[nitpick] Consider renaming 'result_time' to a more descriptive name like 'time_index' to more clearly indicate that this variable represents a DatetimeIndex.

result_time = pd.DatetimeIndex(result.coords["time"].values)

brynpickering

I think this could be considerably simplified if we assume normalize_using_yearly is always annual data. It's unclear if this is the case since the name suggests so, but the possibility of a datetime index suggests it could e.g. be monthly data.

If definitely annual, you could do something like:

result_time = result.coords["time"].to_series()
annual_grouper = result_time.groupby(result_timeindex.year)
result_annual_secs = annual_grouper.apply(lambda x: (x.iloc[-1] - x.iloc[0]).total_seconds())
full_annual_secs = annual_grouper.apply(lambda x: (pd.Timestamp(str(x.name + 1)) - pd.Timestamp(str(x.name))).total_seconds())
fractions = result_annual_secs / full_annual_secs
if (sub_annual := fractions < 0.99).any(): # 99% to handle cases where the result data has a coarse frequency and therefore `result_annual_secs < full_annual_secs` even when it covers the full year
    logger.warning(...)
overlap = normalize_using_yearly.index.intersection(fractions.index) # assuming you've previously normalised the index to integer years
if not overlap:
    raise ValueError("no overlap ...")
overlap_slice = slice(overlap.min(), overlap.max()) # for use later

norm_data = normalize_using_yearly.mul(fractions, fill_value=0)

Even if not definitely annual, the overall amount of code in this conditional can be significantly reduced to make it easier to maintain (I just haven't considered exactly what would be needed)

brynpickering · 2026-05-20T12:45:21Z

    cutout,
    smooth=None,
    lower_threshold_quantile=None,
    normalize_using_yearly=None,


normalize_using_yearly is super opaque as an argument. Can you type hint it and add a docstring? After more time than should be needed, I now understand it to be a pd.Series with a date range on the index and floats as values, but that should not have required several minutes of code inspection to work out 😅

brynpickering · 2026-05-20T12:56:09Z


    if normalize_using_yearly is not None:
        normalize_using_yearly_i = normalize_using_yearly.index
        if isinstance(normalize_using_yearly_i, pd.DatetimeIndex):


You're doing this check multiple times. I get the impression that it's because you want to leave open the option for normalize_using_yearly to have an arbitrary datetime index (e.g. monthly values)? My preference would be to accept only yearly values and then work on that basis from the start. This would be easiest done by formatting normalize_using_yearly to have an integer index (as datetimeindex will set the timestamp to <year>-01-01 00:00 which could easily lead to a mistake when doing slicing.

brynpickering · 2026-05-20T12:58:21Z

-        )
-        assert len(years), "Need at least a full year of data (more is better)"
-        years_overlap = slice(str(min(years)), str(max(years)))
+        result_time = pd.DatetimeIndex(result.coords["time"].values)


Won't "time" always be a datetime?

Suggested change

result_time = pd.DatetimeIndex(result.coords["time"].values)

result_time = result.coords["time"].to_index()

brynpickering · 2026-05-20T17:50:51Z

+            )
+
+        years_overlap = slice(str(overlap_start.date()), str(overlap_end.date()))
        dim = result.dims[1 - result.get_axis_num("time")]


Why expect only ever one dim? To be agnostic of this, you could do:

Suggested change

dim = result.dims[1 - result.get_axis_num("time")]

non_time_dims = list(set(result) - {"time"})

and then assume a list later on as well (so [dim] -> non_time_dims and reindex({dim: result.coords[dim]}) -> reindex({dim: result.coords[dim] for dim in dims}) or reindex_like(result.coords[dims]) (I think))

brynpickering · 2026-05-20T18:11:47Z

        result *= (
-            xr.DataArray(normalize_using_yearly.loc[years_overlap].sum(), dims=[dim])
+            xr.DataArray(norm_data, dims=[dim])
            / result.sel(time=years_overlap).sum("time")
-        ).reindex(countries=result.coords["countries"])
+        ).reindex({dim: result.coords[dim]})


Pretty sure this could just be result *= norm_data / result.sel(time=years_overlap).sum("time") and you wouldn't have to also do the reindexing etc.

support runoff normalization for partial years with annual normalizat…

d75abab

…ion data

fneum requested review from Copilot and lkstrp March 27, 2025 07:13

Copilot AI reviewed Mar 27, 2025

View reviewed changes

fneum requested review from euronion and removed request for lkstrp May 25, 2025 11:12

fneum requested a review from coroa August 23, 2025 06:29

euronion moved this to Usability in atlite vision and roadmap Oct 8, 2025

euronion added this to atlite vision and roadmap Oct 8, 2025

euronion moved this from Usability to Quick wins in atlite vision and roadmap Oct 8, 2025

euronion requested a review from brynpickering May 19, 2026 12:48

Merge branch 'master' into runoff-subyearly

9bce830

brynpickering reviewed May 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support runoff normalization for partial years with annual normalization data#434

support runoff normalization for partial years with annual normalization data#434
fneum wants to merge 2 commits into
masterfrom
runoff-subyearly

fneum commented Mar 27, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

brynpickering left a comment

Uh oh!

brynpickering May 20, 2026

Uh oh!

brynpickering May 20, 2026

Uh oh!

brynpickering May 20, 2026

Uh oh!

brynpickering May 20, 2026

Uh oh!

brynpickering May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	result_time = pd.DatetimeIndex(result.coords["time"].values)
	result_time = result.coords["time"].to_index()

	dim = result.dims[1 - result.get_axis_num("time")]
	non_time_dims = list(set(result) - {"time"})

Conversation

fneum commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes proposed in this Pull Request

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

brynpickering left a comment

Choose a reason for hiding this comment

Uh oh!

brynpickering May 20, 2026

Choose a reason for hiding this comment

Uh oh!

brynpickering May 20, 2026

Choose a reason for hiding this comment

Uh oh!

brynpickering May 20, 2026

Choose a reason for hiding this comment

Uh oh!

brynpickering May 20, 2026

Choose a reason for hiding this comment

Uh oh!

brynpickering May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fneum commented Mar 27, 2025 •

edited

Loading