Skip to content

Minor GHP updates#476

Open
ppinchuk wants to merge 43 commits into
mainfrom
pp/ghp_updates
Open

Minor GHP updates#476
ppinchuk wants to merge 43 commits into
mainfrom
pp/ghp_updates

Conversation

@ppinchuk

Copy link
Copy Markdown
Collaborator

Minor updates to GHP collection and parsing before validation

@ppinchuk ppinchuk self-assigned this Jun 12, 2026
Copilot AI review requested due to automatic review settings June 12, 2026 23:59
@ppinchuk ppinchuk requested a review from castelao as a code owner June 12, 2026 23:59
@ppinchuk ppinchuk added enhancement Update to logic or general code improvements p-critical Priority: critical labels Jun 12, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Minor improvements to the GHP (geothermal heat pump) pipeline around document collection/parsing and pre-validation filtering, including tighter “substantive text” gating and optional limits on how many documents get parsed per jurisdiction.

Changes:

  • Add max_num_docs_to_parse_per_jurisdiction request setting and plumb it through extraction to cap the number of docs passed into plugin parsing.
  • Improve document-type validation prompts/graph to require substantive legal text (not just ToC/headings/citations) before treating content as legally binding.
  • Normalize Docling “missing” confidence values to None (instead of NaN/pd.NA) and add a unit test for that behavior.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/python/unit/services/test_services_cpu.py Adds coverage for _read_docling confidence normalization behavior.
tests/python/unit/pipeline/test_pipeline_orchestration.py Updates pipeline orchestration tests to align with the new filter_docs signature.
tests/python/integration/test_integrated_pipeline_orchestration.py Adjusts integration test plugin to match updated plugin interface (filter_docs).
docs/source/dev/advanced_plugin_development.rst Updates plugin contract documentation for filter_docs(..., max_num_docs=None).
compass/validation/graphs.py Adds a “substantive legal text” gate to the legal-text/document-type decision tree.
compass/services/cpu.py Normalizes Docling confidence values via _none_if_missing helper.
compass/plugin/one_shot/components.py Tightens collection prompt to avoid treating headings/ToC/citations as relevant by themselves.
compass/plugin/interface.py Extends filter_docs to accept max_num_docs and implements slicing before parsing.
compass/plugin/base.py Updates the abstract filter_docs contract to include max_num_docs.
compass/pipeline/jurisdiction.py Adds progress-bar status update while loading pre-parsed documents.
compass/pipeline/extraction.py Passes the per-jurisdiction parse cap into extractor.filter_docs(...).
compass/pipeline/data_classes.py Introduces DocParsingParams and adds max_num_docs_to_parse_per_jurisdiction to requests.
compass/pipeline/coordinator.py Adds manifest-loading logs for extraction mode.
compass/pipeline/collection/steps.py Minor formatting change around local file loader kwargs.
compass/extraction/water/plugin.py Updates filter_docs signature for the new interface (but currently uses __).
compass/extraction/ghp/plugin_config.yaml Tweaks GHP query templates and heuristic keywords toward “private heat exchange wells”.
compass/extraction/ghp/geothermal_heat_pump_schema.json5 Refines GHP schema scope/evidence rules to reject non-substantive headings/ToC-only evidence.

Comment thread compass/extraction/water/plugin.py
Comment thread tests/python/integration/test_integrated_pipeline_orchestration.py
Comment thread compass/pipeline/data_classes.py
@codecov-commenter

codecov-commenter commented Jun 13, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 74.80315% with 32 lines in your changes missing coverage. Please review.
✅ Project coverage is 62.59%. Comparing base (8290f5e) to head (09b6077).

Files with missing lines Patch % Lines
compass/services/cpu.py 72.72% 9 Missing and 6 partials ⚠️
compass/pipeline/collection/steps.py 16.66% 10 Missing ⚠️
compass/plugin/interface.py 25.00% 3 Missing ⚠️
compass/validation/graphs.py 0.00% 3 Missing ⚠️
compass/plugin/one_shot/components.py 94.73% 1 Missing ⚠️

❌ Your patch status has failed because the patch coverage (74.80%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #476      +/-   ##
==========================================
+ Coverage   61.72%   62.59%   +0.86%     
==========================================
  Files          77       77              
  Lines        6937     7042     +105     
  Branches      690      704      +14     
==========================================
+ Hits         4282     4408     +126     
+ Misses       2535     2501      -34     
- Partials      120      133      +13     
Flag Coverage Δ
unittests 62.59% <74.80%> (+0.86%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Update to logic or general code improvements p-critical Priority: critical

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants