Skip to content

support runoff normalization for partial years with annual normalization data#434

Open
fneum wants to merge 2 commits into
masterfrom
runoff-subyearly
Open

support runoff normalization for partial years with annual normalization data#434
fneum wants to merge 2 commits into
masterfrom
runoff-subyearly

Conversation

@fneum
Copy link
Copy Markdown
Member

@fneum fneum commented Mar 27, 2025

Changes proposed in this Pull Request

It makes cutout.runoff(normalize_using_yearly=...) more lenient by supporting proportional scaling of partially represented years in the cutout, assuming even distribution of runoff throughout the year. In this case, a warning is printed due to the strong assumption.

Relevant for PyPSA/pypsa-eur#1613 (e.g. July-June meteorological years, in case the original cutout does not span Jan-Dec for each year included)

This also means we can include hydro in the CI test of PyPSA-Eur, which uses only a week of weather data.

Checklist

  • Code changes are sufficiently documented; i.e. new functions contain docstrings and further explanations may be given in doc.
  • [] Unit tests for new features were added (if applicable).
  • [] Newly introduced dependencies are added to environment.yaml, environment_docs.yaml and setup.py (if applicable).
  • A note for the release notes doc/release_notes.rst of the upcoming release is included.
  • I consent to the release of this PR's code under the MIT license.

@fneum fneum requested review from Copilot and lkstrp March 27, 2025 07:13
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the runoff normalization function to support partial years by applying proportional scaling based on available data, while issuing a warning about the strong assumption of even runoff distribution.

  • Improved handling of date ranges for runoff and normalization data
  • Added partial year normalization logic with proportional scaling and a warning message
  • Updated reindexing to use a dynamic coordinate based on the altered dimension
Files not reviewed (1)
  • RELEASE_NOTES.rst: Language not supported
Comments suppressed due to low confidence (3)

atlite/convert.py:1026

  • [nitpick] Ensure that reindexing using the dynamic dimension (dim) aligns with the intended coordinate, especially if the original 'countries' coordinate was explicitly required in other parts of the code.
).reindex({dim: result.coords[dim]})

atlite/convert.py:1010

  • [nitpick] Verify that the proportional scaling logic correctly handles all edge cases, such as when the overlap period almost covers a full year, to ensure the computed fraction accurately reflects the intended partial data scaling.
fraction = (end - start).total_seconds() / ((year_end - year_start).total_seconds())

atlite/convert.py:965

  • [nitpick] Consider renaming 'result_time' to a more descriptive name like 'time_index' to more clearly indicate that this variable represents a DatetimeIndex.
result_time = pd.DatetimeIndex(result.coords["time"].values)

@fneum fneum requested review from euronion and removed request for lkstrp May 25, 2025 11:12
@fneum fneum requested a review from coroa August 23, 2025 06:29
@euronion euronion moved this to Usability in atlite vision and roadmap Oct 8, 2025
@euronion euronion moved this from Usability to Quick wins in atlite vision and roadmap Oct 8, 2025
@euronion euronion requested a review from brynpickering May 19, 2026 12:48
Copy link
Copy Markdown
Contributor

@brynpickering brynpickering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be considerably simplified if we assume normalize_using_yearly is always annual data. It's unclear if this is the case since the name suggests so, but the possibility of a datetime index suggests it could e.g. be monthly data.

If definitely annual, you could do something like:

result_time = result.coords["time"].to_series()
annual_grouper = result_time.groupby(result_timeindex.year)
result_annual_secs = annual_grouper.apply(lambda x: (x.iloc[-1] - x.iloc[0]).total_seconds())
full_annual_secs = annual_grouper.apply(lambda x: (pd.Timestamp(str(x.name + 1)) - pd.Timestamp(str(x.name))).total_seconds())
fractions = result_annual_secs / full_annual_secs
if (sub_annual := fractions < 0.99).any(): # 99% to handle cases where the result data has a coarse frequency and therefore `result_annual_secs < full_annual_secs` even when it covers the full year
    logger.warning(...)
overlap = normalize_using_yearly.index.intersection(fractions.index) # assuming you've previously normalised the index to integer years
if not overlap:
    raise ValueError("no overlap ...")
overlap_slice = slice(overlap.min(), overlap.max()) # for use later

norm_data = normalize_using_yearly.mul(fractions, fill_value=0)

Even if not definitely annual, the overall amount of code in this conditional can be significantly reduced to make it easier to maintain (I just haven't considered exactly what would be needed)

Comment thread atlite/convert.py
cutout,
smooth=None,
lower_threshold_quantile=None,
normalize_using_yearly=None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normalize_using_yearly is super opaque as an argument. Can you type hint it and add a docstring? After more time than should be needed, I now understand it to be a pd.Series with a date range on the index and floats as values, but that should not have required several minutes of code inspection to work out 😅

Comment thread atlite/convert.py

if normalize_using_yearly is not None:
normalize_using_yearly_i = normalize_using_yearly.index
if isinstance(normalize_using_yearly_i, pd.DatetimeIndex):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're doing this check multiple times. I get the impression that it's because you want to leave open the option for normalize_using_yearly to have an arbitrary datetime index (e.g. monthly values)? My preference would be to accept only yearly values and then work on that basis from the start. This would be easiest done by formatting normalize_using_yearly to have an integer index (as datetimeindex will set the timestamp to <year>-01-01 00:00 which could easily lead to a mistake when doing slicing.

Comment thread atlite/convert.py
)
assert len(years), "Need at least a full year of data (more is better)"
years_overlap = slice(str(min(years)), str(max(years)))
result_time = pd.DatetimeIndex(result.coords["time"].values)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't "time" always be a datetime?

Suggested change
result_time = pd.DatetimeIndex(result.coords["time"].values)
result_time = result.coords["time"].to_index()

Comment thread atlite/convert.py
)

years_overlap = slice(str(overlap_start.date()), str(overlap_end.date()))
dim = result.dims[1 - result.get_axis_num("time")]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why expect only ever one dim? To be agnostic of this, you could do:

Suggested change
dim = result.dims[1 - result.get_axis_num("time")]
non_time_dims = list(set(result) - {"time"})

and then assume a list later on as well (so [dim] -> non_time_dims and reindex({dim: result.coords[dim]}) -> reindex({dim: result.coords[dim] for dim in dims}) or reindex_like(result.coords[dims]) (I think))

Comment thread atlite/convert.py
Comment on lines 1023 to +1026
result *= (
xr.DataArray(normalize_using_yearly.loc[years_overlap].sum(), dims=[dim])
xr.DataArray(norm_data, dims=[dim])
/ result.sel(time=years_overlap).sum("time")
).reindex(countries=result.coords["countries"])
).reindex({dim: result.coords[dim]})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure this could just be result *= norm_data / result.sel(time=years_overlap).sum("time") and you wouldn't have to also do the reindexing etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Quick wins

Development

Successfully merging this pull request may close these issues.

4 participants