Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #487 +/- ##
==========================================
+ Coverage 61.40% 61.72% +0.32%
==========================================
Files 77 77
Lines 6868 6937 +69
Branches 676 690 +14
==========================================
+ Hits 4217 4282 +65
Misses 2535 2535
- Partials 116 120 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds an ord_text field to one-shot extraction schemas and threads it through the one-shot parsing and finalization pipeline so the verbatim source sentence can be emitted into final CSV outputs, with a regression test ensuring the field round-trips.
Changes:
- Add required
ord_textto one-shot schemas (geothermal electricity, GHP, water rights demo) with instructions to keep it verbatim. - Include
ord_textin one-shot parser DataFrame construction and in finalized output column sets. - Extend unit test coverage to assert
ord_textis written to both quantitative and qualitative output CSVs.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
compass/utilities/finalize.py |
Adds ord_text to parsed/output column definitions and qualitative output column selection. |
compass/plugin/one_shot/components.py |
Ensures one-shot parsing output includes ord_text when present in schema outputs. |
tests/python/unit/utilities/test_utilities_finalize.py |
Regression test for ord_text round-tripping into both output CSVs. |
examples/water_rights_demo/one-shot/water_rights_schema.json5 |
Adds required ord_text field, updates examples/instructions for verbatim extraction. |
compass/extraction/ghp/geothermal_heat_pump_schema.json5 |
Adds required ord_text field plus guidance/examples. |
compass/extraction/geothermal_electricity/geothermal_schema.json |
Adds required ord_text field plus guidance/examples. |
| QUANT_OUT_COLS = _PARSED_COLS[:-1] | ||
| """Output columns in quantitative ordinance file""" | ||
| QUAL_OUT_COLS = _PARSED_COLS[:6] + _PARSED_COLS[-5:-1] | ||
| QUAL_OUT_COLS = _PARSED_COLS[:6] + _PARSED_COLS[-6:-1] |
| "Extract only enacted district requirements, not proposed language or general background text.", | ||
| "Use direct excerpts/quotes in summary whenever possible.", | ||
| "Whenever a feature row is emitted, ord_text must be the first full sentence from the source document corresponding to the requirement being extracted, copied verbatim: for quantitative features the full source sentence containing the extracted value, and for qualitative features the first full source sentence relating to the requirement. Unlike summary, ord_text must be a single contiguous sentence reproduced exactly as written, with no paraphrasing, normalization, ellipses, added context, or commentary. If a feature has no requirement, omit the feature row entirely rather than emitting a row with an empty ord_text.", | ||
| "If a feature has no requirement, set value, units, section, and summary to null or omit the feature row.", |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a verbatim
ord_textfield to the one-shot extraction schemas (geothermal electricity, GHP, water rights demo), capturing the source sentence an ordinance value/requirement was extracted from, unparaphrased.Threads
ord_textthroughfinalize.pyso it routes into both the qualitative and quantitative output CSVs, and through theSchemaOrdinanceParsercolumn list. Includes a regression test confirming the value round-trips into both output files.