Skip to content

Add verbatim ord_text field to one-shot extraction schemas#487

Open
rajeee wants to merge 2 commits into
mainfrom
quotes
Open

Add verbatim ord_text field to one-shot extraction schemas#487
rajeee wants to merge 2 commits into
mainfrom
quotes

Conversation

@rajeee

@rajeee rajeee commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Adds a verbatim ord_text field to the one-shot extraction schemas (geothermal electricity, GHP, water rights demo), capturing the source sentence an ordinance value/requirement was extracted from, unparaphrased.

Threads ord_text through finalize.py so it routes into both the qualitative and quantitative output CSVs, and through the SchemaOrdinanceParser column list. Includes a regression test confirming the value round-trips into both output files.

Copilot AI review requested due to automatic review settings June 29, 2026 13:54
@rajeee rajeee requested review from castelao and ppinchuk as code owners June 29, 2026 13:54
@codecov-commenter

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.72%. Comparing base (eb7b6ae) to head (e30f347).
⚠️ Report is 6 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #487      +/-   ##
==========================================
+ Coverage   61.40%   61.72%   +0.32%     
==========================================
  Files          77       77              
  Lines        6868     6937      +69     
  Branches      676      690      +14     
==========================================
+ Hits         4217     4282      +65     
  Misses       2535     2535              
- Partials      116      120       +4     
Flag Coverage Δ
unittests 61.72% <100.00%> (+0.32%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an ord_text field to one-shot extraction schemas and threads it through the one-shot parsing and finalization pipeline so the verbatim source sentence can be emitted into final CSV outputs, with a regression test ensuring the field round-trips.

Changes:

  • Add required ord_text to one-shot schemas (geothermal electricity, GHP, water rights demo) with instructions to keep it verbatim.
  • Include ord_text in one-shot parser DataFrame construction and in finalized output column sets.
  • Extend unit test coverage to assert ord_text is written to both quantitative and qualitative output CSVs.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
compass/utilities/finalize.py Adds ord_text to parsed/output column definitions and qualitative output column selection.
compass/plugin/one_shot/components.py Ensures one-shot parsing output includes ord_text when present in schema outputs.
tests/python/unit/utilities/test_utilities_finalize.py Regression test for ord_text round-tripping into both output CSVs.
examples/water_rights_demo/one-shot/water_rights_schema.json5 Adds required ord_text field, updates examples/instructions for verbatim extraction.
compass/extraction/ghp/geothermal_heat_pump_schema.json5 Adds required ord_text field plus guidance/examples.
compass/extraction/geothermal_electricity/geothermal_schema.json Adds required ord_text field plus guidance/examples.

QUANT_OUT_COLS = _PARSED_COLS[:-1]
"""Output columns in quantitative ordinance file"""
QUAL_OUT_COLS = _PARSED_COLS[:6] + _PARSED_COLS[-5:-1]
QUAL_OUT_COLS = _PARSED_COLS[:6] + _PARSED_COLS[-6:-1]
"Extract only enacted district requirements, not proposed language or general background text.",
"Use direct excerpts/quotes in summary whenever possible.",
"Whenever a feature row is emitted, ord_text must be the first full sentence from the source document corresponding to the requirement being extracted, copied verbatim: for quantitative features the full source sentence containing the extracted value, and for qualitative features the first full source sentence relating to the requirement. Unlike summary, ord_text must be a single contiguous sentence reproduced exactly as written, with no paraphrasing, normalization, ellipses, added context, or commentary. If a feature has no requirement, omit the feature row entirely rather than emitting a row with an empty ord_text.",
"If a feature has no requirement, set value, units, section, and summary to null or omit the feature row.",
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants