feat: LEARNREADORIENTATIONMODEL and CALCULATE_CONTAMINATION by emmcauley · Pull Request #88 · fulcrumgenomics/twistcgp

emmcauley · 2026-03-24T16:49:46Z

This PR adds GATK4_CALCULATECONTAMINATION, GATK4_GETPILEUPSUMMARIES, and GATK4_LEARNREADORIENTATIONMODEL. These modules are in-line with best practices,especially if the submitted samples are FFPE (which are particularly sensitive to artifacts).

github-actions · 2026-03-24T16:51:53Z

`nf-core pipelines lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit 087110a

+| ✅  99 tests passed       |+
#| ❔  57 tests were ignored |#
!| ❗  14 tests had warnings |!

Details

❗ Test warnings:

files_exist - File not found: .github/workflows/awstest.yml
files_exist - File not found: .github/workflows/awsfulltest.yml
files_exist - File not found: ro-crate-metadata.json
readme - README did not have an nf-core template version badge.
readme - README contains the placeholder zenodo.XXXXXXX. This should be replaced with the zenodo doi (after the first release).
pipeline_todos - TODO string in README.md: Include a figure that guides the user through the major workflow steps. Many nf-core
pipeline_todos - TODO string in README.md: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file.
pipeline_todos - TODO string in nextflow.config: Specify any additional parameters here
pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
pipeline_todos - TODO string in base.config: Check the defaults for all processes
pipeline_todos - TODO string in base.config: Customise requirements for specific processes.
pipeline_todos - TODO string in main.nf.test: Once you have added the required tests, please run the following command to build this file:

❔ Tests ignored:

files_exist - File is ignored: .editorconfig
files_exist - File is ignored: .github/.dockstore.yml
files_exist - File is ignored: .github/CONTRIBUTING.md
files_exist - File is ignored: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File is ignored: .github/ISSUE_TEMPLATE/config.yml
files_exist - File is ignored: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File is ignored: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File is ignored: .github/actions/get-shards/action.yml
files_exist - File is ignored: .github/actions/nf-test/action.yml
files_exist - File is ignored: .github/workflows/branch.yml
files_exist - File is ignored: .github/workflows/ci.yml
files_exist - File is ignored: .github/workflows/linting.yml
files_exist - File is ignored: .github/workflows/linting_comment.yml
files_exist - File is ignored: .github/workflows/nf-test.yml
files_exist - File is ignored: .prettierignore
files_exist - File is ignored: .prettierrc.yml
files_exist - File is ignored: CHANGELOG.md
files_exist - File is ignored: CITATIONS.md
files_exist - File is ignored: CODE_OF_CONDUCT.md
files_exist - File is ignored: LICENSE
files_exist - File is ignored: assets/email_template.html
files_exist - File is ignored: assets/email_template.txt
files_exist - File is ignored: assets/nf-core-twistcgp_logo_light.png
files_exist - File is ignored: assets/sendmail_template.txt
files_exist - File is ignored: conf/igenomes.config
files_exist - File is ignored: conf/igenomes_ignored.config
files_exist - File is ignored: conf/test_full.config
files_exist - File is ignored: docs/images/nf-core-twistcgp_logo_dark.png
files_exist - File is ignored: docs/images/nf-core-twistcgp_logo_light.png
files_exist - File is ignored: docs/output.md
files_exist - File is ignored: docs/README.md
files_exist - File is ignored: docs/usage.md
nextflow_config - nextflow_config
nf_test_content - nf_test_content
files_unchanged - File ignored due to lint config: CODE_OF_CONDUCT.md
files_unchanged - File ignored due to lint config: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_unchanged - File ignored due to lint config: .github/.dockstore.yml
files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/config.yml
files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/feature_request.yml
files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md
files_unchanged - File ignored due to lint config: .github/workflows/branch.yml
files_unchanged - File ignored due to lint config: .github/workflows/linting_comment.yml
files_unchanged - File ignored due to lint config: .github/workflows/linting.yml
files_unchanged - File does not exist: assets/email_template.html
files_unchanged - File ignored due to lint config: assets/email_template.txt
files_unchanged - File does not exist: assets/sendmail_template.txt
files_unchanged - File ignored due to lint config: assets/nf-core-twistcgp_logo_light.png
files_unchanged - File ignored due to lint config: docs/images/nf-core-twistcgp_logo_light.png
files_unchanged - File ignored due to lint config: docs/images/nf-core-twistcgp_logo_dark.png
files_unchanged - File ignored due to lint config: docs/README.md
files_unchanged - File ignored due to lint config: .gitignore or .prettierignore
actions_nf_test - actions_nf_test
actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/twistcgp/twistcgp/.github/workflows/awstest.yml
actions_awsfulltest - actions_awsfulltest
rocrate_readme_sync - rocrate_readme_sync

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: nf-test.config
files_exist - File found: tests/default.nf.test
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: modules.json
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: docs/images/nf-core-twistcgp_logo.png
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/NfcoreTemplate.groovy
files_exist - File not found check: lib/Utils.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: lib/WorkflowMain.groovy
files_exist - File not found check: lib/WorkflowTwistcgp.groovy
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: Singularity
files_exist - File not found check: lib/nfcore_external_java_deps.jar
files_exist - File not found check: .travis.yml
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
pipeline_if_empty_null - No ifEmpty(null) strings found
plugin_includes - No wrong validation plugin imports have been found
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (0 files)
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: linting_comment.yml
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: twistgp_ci.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - assets/multiqc_config.yml found and not ignored.
multiqc_config - assets/multiqc_config.yml contains report_section_order
multiqc_config - assets/multiqc_config.yml contains export_plots
multiqc_config - assets/multiqc_config.yml contains report_comment
multiqc_config - assets/multiqc_config.yml follows the ordering scheme of the minimally required plugins.
multiqc_config - assets/multiqc_config.yml contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'
local_component_structure - local subworkflows directory structure is correct 'subworkflows/local/TOOL/SUBTOOL'
base_config - conf/base.config found and not ignored.
modules_config - conf/modules.config found and not ignored.
modules_config - ALIGNBAM found in conf/modules.config and Nextflow scripts.
modules_config - BCFTOOLS_VIEW found in conf/modules.config and Nextflow scripts.
modules_config - BWAMEM2_INDEX found in conf/modules.config and Nextflow scripts.
modules_config - CIVICPY_UPDATE_CACHE found in conf/modules.config and Nextflow scripts.
modules_config - CIVICPY_ANNOTATE_VCF found in conf/modules.config and Nextflow scripts.
modules_config - CNVKIT_BATCH found in conf/modules.config and Nextflow scripts.
modules_config - ENSEMBLVEP_DOWNLOAD found in conf/modules.config and Nextflow scripts.
modules_config - ENSEMBLVEP_VEP found in conf/modules.config and Nextflow scripts.
modules_config - GATK4_MUTECT2 found in conf/modules.config and Nextflow scripts.
modules_config - GATK4_FILTERMUTECTCALLS found in conf/modules.config and Nextflow scripts.
modules_config - GATK4_CALCULATECONTAMINATION found in conf/modules.config and Nextflow scripts.
modules_config - GATK4_GETPILEUPSUMMARIES found in conf/modules.config and Nextflow scripts.
modules_config - GATK4_LEARNREADORIENTATIONMODEL found in conf/modules.config and Nextflow scripts.
modules_config - FASTP found in conf/modules.config and Nextflow scripts.
modules_config - FASTQC found in conf/modules.config and Nextflow scripts.
modules_config - FGBIO_FASTQTOBAM found in conf/modules.config and Nextflow scripts.
modules_config - MSISENSORPRO_SCAN found in conf/modules.config and Nextflow scripts.
modules_config - MSISENSOR2_MSI found in conf/modules.config and Nextflow scripts.
modules_config - MSISENSORPRO_PRO found in conf/modules.config and Nextflow scripts.
modules_config - PERBASE found in conf/modules.config and Nextflow scripts.
modules_config - PICARD found in conf/modules.config and Nextflow scripts.
modules_config - PICARD_COLLECTHSMETRICS found in conf/modules.config and Nextflow scripts.
modules_config - PICARD_COLLECTMULTIPLEMETRICS found in conf/modules.config and Nextflow scripts.
modules_config - PICARD_INTERVALLISTTOBED found in conf/modules.config and Nextflow scripts.
modules_config - PICARD_MARKDUPLICATES found in conf/modules.config and Nextflow scripts.
modules_config - SAMTOOLS_FAIDX found in conf/modules.config and Nextflow scripts.
modules_config - SAMTOOLS_DICT found in conf/modules.config and Nextflow scripts.
modules_config - SNPEFF_DOWNLOAD found in conf/modules.config and Nextflow scripts.
modules_config - SNPEFF_SNPEFF found in conf/modules.config and Nextflow scripts.
modules_config - TMB found in conf/modules.config and Nextflow scripts.
modules_config - MULTIQC found in conf/modules.config and Nextflow scripts.
modules_config - TWISTCGP found in conf/modules.config and Nextflow scripts.
modules_config - TABIX_POPULATION_GERMLINE found in conf/modules.config and Nextflow scripts.
modules_config - TABIX_PON found in conf/modules.config and Nextflow scripts.
modules_config - TABIX_COSMIC found in conf/modules.config and Nextflow scripts.
modules_config - TABIX_GNOMAD found in conf/modules.config and Nextflow scripts.
nfcore_yml - Repository type in .nf-core.yml is valid: pipeline
nfcore_yml - nf-core version in .nf-core.yml is set to the latest version: 3.3.2

Run details

nf-core/tools version 3.3.2
Run at 2026-05-06 23:51:47

znorgaard · 2026-04-01T22:10:07Z

@coderabbitai review

coderabbitai · 2026-04-01T22:10:13Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2026-04-01T22:10:16Z

📝 Walkthrough

Walkthrough

This pull request introduces three new GATK4 modules to the pipeline: GATK4_CALCULATECONTAMINATION, GATK4_GETPILEUPSUMMARIES, and GATK4_LEARNREADORIENTATIONMODEL. Each module includes configuration files, Conda environment definitions, Nextflow process implementations, metadata declarations, and test specifications. The workflow integrates these modules sequentially: GATK4_MUTECT2 output feeds into GATK4_LEARNREADORIENTATIONMODEL, which produces orientation bias artifacts; GATK4_GETPILEUPSUMMARIES generates pileup summaries from BAM/CRAM files; GATK4_CALCULATECONTAMINATION processes pileup summaries to estimate contamination. These outputs then flow into GATK4_FILTERMUTECTCALLS for variant filtering.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title references two of the three modules being added (LEARNREADORIENTATIONMODEL and CALCULATECONTAMINATION) but omits GETPILEUPSUMMARIES, which is equally important to the changeset.
Description check	✅ Passed	The description accurately relates to the changeset by naming all three GATK4 modules being added and explaining their relevance for FFPE samples.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch em_learn_model

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

znorgaard · 2026-04-01T22:52:35Z

@coderabbitai review

coderabbitai · 2026-04-01T22:52:41Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 13

🧹 Nitpick comments (2)

workflows/twistcgp.nf (1)

133-165: Wire the new GATK steps into ch_versions.

These three process calls never feed their versions_gatk4 channels into ch_versions, so twistcgp_software_mqc_versions.yml will omit them.

Possible fix

+    ch_versions = ch_versions.mix(GATK4_LEARNREADORIENTATIONMODEL.out.versions_gatk4.first())
+    ch_versions = ch_versions.mix(GATK4_GETPILEUPSUMMARIES.out.versions_gatk4.first())
+    ch_versions = ch_versions.mix(GATK4_CALCULATECONTAMINATION.out.versions_gatk4.first())

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@workflows/twistcgp.nf` around lines 133 - 165, The GATK steps
GATK4_LEARNREADORIENTATIONMODEL, GATK4_GETPILEUPSUMMARIES, and
GATK4_CALCULATECONTAMINATION are not contributing their versions_gatk4 outputs
into ch_versions, so their versions are omitted from
twistcgp_software_mqc_versions.yml; update the workflow to push or merge each
process' versions_gatk4 channel into ch_versions (for example, after calling
GATK4_LEARNREADORIENTATIONMODEL(...), append ch_versions <<
GATK4_LEARNREADORIENTATIONMODEL.out.versions_gatk4, and do the same for
GATK4_GETPILEUPSUMMARIES.out.versions_gatk4 and
GATK4_CALCULATECONTAMINATION.out.versions_gatk4) so the versions aggregator
collects them.

modules/nf-core/gatk4/learnreadorientationmodel/tests/main.nf.test (1)

19-20: Use the real upstream f1r2 shape in the test.

The workflow passes GATK4_MUTECT2.out.f1r2 directly here, and a path() output without arity emits a single file when one match is produced. These cases wrap that file in a one-element list, so the test is exercising a different binding shape than production. (nextflow.io)

Also applies to: 43-44
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/nf-core/gatk4/learnreadorientationmodel/tests/main.nf.test` around
lines 19 - 20, The test is wrapping the f1r2 artifact in an extra one-element
list, which differs from the real upstream binding shape
(GATK4_MUTECT2.out.f1r2) that emits a single file; update the test so the second
element is the file path itself instead of a list (i.e., change the
[file(params.modules_testdata_base_path + ...)] entry to
file(params.modules_testdata_base_path + ...) in the input[0] assignment), and
make the same change at the other occurrence (lines noted as also applies to
43-44).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@conf/modules.config`:
- Around line 125-132: The publishDir for withName: GATK4_GETPILEUPSUMMARIES is
using a wrong pattern and a misleading ext.prefix; change ext.prefix from
"mutect2.artifactprior" to something like "mutect2.pileups" or
"mutect2.pileup_summaries" and update publishDir.pattern to match the actual
outputs (e.g. use "*.pileups.table*" or a stricter glob like
"*.{pileups.table,pileups.table.gz}") so the generated *.pileups.table (and
optional gzipped variants) are picked up and published.

In `@modules/nf-core/gatk4/calculatecontamination/main.nf`:
- Around line 43-48: The stub currently always creates both
"${prefix}.contamination.table" and "${prefix}.segmentation.table" which
mismatches the process' optional segmentation output; update the stub block so
it always touches "${prefix}.contamination.table" but only touches
"${prefix}.segmentation.table" when the tumor segmentation option was requested
(e.g. check an extension flag such as task.ext.tumor_segmentation or
task.ext.tumorSegmentation that corresponds to the --tumor-segmentation flag),
leaving the rest of the stub and the prefix assignment unchanged.

In `@modules/nf-core/gatk4/calculatecontamination/meta.yml`:
- Around line 41-63: Update the outputs' meta entries so they are Groovy maps
instead of files and fix the segmentation meta text/pattern: change the meta
block under both contamination and segmentation from "type: file" to "type: map"
and adjust the segmentation meta description and pattern to reference the
segmentation output (e.g., describe "segmentation of tumor minor allele
fractions" and use the "*.segmentation.table" pattern) so the "meta" map for
contamination refers to contamination metadata and the "meta" map for
segmentation refers to segmentation metadata.

In `@modules/nf-core/gatk4/getpileupsummaries/meta.yml`:
- Line 7: The workflow metadata contains a typo: replace the incorrect keyword
string "getpileupsumaries" with the correct "getpileupsummaries" in meta.yml so
the tool name matches expected naming; update any occurrences of
getpileupsumaries (e.g., the entry currently present in meta.yml) to
getpileupsummaries to ensure consistency.
- Around line 72-83: The inputs `variants` and `variants_tbi` in meta.yml are
defined as single-item entries but should use the nested list/grouped format
used elsewhere (the "- - " pattern) to match nf-core tooling expectations;
update both `variants` and `variants_tbi` to be arrays of arrays (convert their
current single mapping into a nested list entry), preserving their keys (`type`,
`description`, `pattern`, `ontologies`) and values so the structure matches
other grouped inputs and avoids parsing errors.
- Line 16: The documentation URL value in meta.yml appears malformed (the
documentation:
"https://gatk.broadinstitute.org/hc/en-us/categories/360002369672s"); update the
documentation field in modules/nf-core/gatk4/getpileupsummaries/meta.yml to the
correct URL (likely remove the trailing 's' to
"https://gatk.broadinstitute.org/hc/en-us/categories/360002369672" or replace
with the verified, intended documentation link), and run a quick curl check to
confirm the corrected URL returns a 200 before committing.

In `@modules/nf-core/gatk4/learnreadorientationmodel/main.nf`:
- Line 14: The tuple declaration tuple val(meta), path("*.tar.gz"), emit:
artifactprior is too broad and catches staged files like *.f1r2.tar.gz; narrow
the glob to only the intended artifact filenames (for example replace "*.tar.gz"
with a more specific pattern such as "*.prior.tar.gz" or the exact artifact
basename used downstream) so that only the correct prior artifact files are
emitted as artifactprior.
- Line 15: The pipeline uses the Nextflow "topic:" syntax in the tuple emission
(the tuple line with topic: versions / emit: versions_gatk4) which requires
Nextflow 25.04+ or the preview flag; update the project constraint
(nextflowVersion) to allow >=25.04.0, or alternatively enable the preview
feature by adding the config key nextflow.preview.topic = true in the pipeline
configuration so the tuple with topic: versions works on older Nextflow
releases; locate the tuple emission (topic: versions) in
learnreadorientationmodel/main.nf and adjust either the nextflowVersion
constraint or the config to satisfy the requirement.
- Line 23: input_list is built by calling .collect() on f1r2 which is a single
Path (not a List) and will throw NoSuchMethodError at runtime; change the
construction of input_list in the learnreadorientationmodel process to handle
both single Path and list cases (e.g., detect if f1r2 is a List and join via
.collect(...).join(' '), otherwise format a single "--input ${f1r2}" string) so
it works when MUTECT2 binds a single file as a Path.

In `@modules/nf-core/gatk4/learnreadorientationmodel/meta.yml`:
- Around line 31-43: The artifactprior output block is incorrect: change the
nested meta tuple entry from "type: file" to "type: map" and give it a generic
metadata description, and remove file-specific keys (pattern, ontologies) from
that meta map; ensure the file entry for "*.tar.gz" (the second tuple element)
retains the file-specific attributes (type: file, pattern: "*.tar.gz",
description: ..., ontologies: [...]) so all file attributes live only on the
"*.tar.gz" file entry while meta becomes a simple map describing the metadata.

In `@modules/nf-core/gatk4/learnreadorientationmodel/tests/main.nf.test`:
- Around line 28-31: The test is incorrectly using linesGzip on a .tar.gz
tarball; change the assertion to snapshot the archive path itself or validate
extracted contents instead: replace the snapshot call that references
path(process.out.artifactprior[0][1]).linesGzip[3..7] with either
snapshot(path(process.out.artifactprior[0][1])) to record the tarball, or
extract the tarball in the test and assert on the extracted files' contents
(e.g., list or specific file contents) when calling snapshot; update the test
that constructs the snapshot (the assertion block containing
process.out.artifactprior and process.out.findAll) accordingly.

In `@workflows/twistcgp.nf`:
- Around line 176-179: The current inner joins drop samples when
GATK4_CALCULATECONTAMINATION.out.segmentation is not emitted; change the first
join to be a remainder join (use remainder: true on the join between
ch_mutect2_samples and GATK4_CALCULATECONTAMINATION.out.segmentation) and ensure
missing segmentation values are coalesced to an empty list before the subsequent
join to contamination/consumers (e.g., when unpacking the joined tuple for
ch_filtermutect_in replace a null/undefined segmentation with []), so no samples
are silently lost before GATK4_FILTERMUTECTCALLS.
- Around line 171-174: The current chain uses an inner join that drops samples
missing artifact priors: replace the final
.join(GATK4_LEARNREADORIENTATIONMODEL.out.artifactprior) with a left-join
variant so ch_mutect2_samples retains all GATK4_MUTECT2 entries and unmatched
artifactprior values are passed as empty/null; specifically change the join
operation used to combine GATK4_MUTECT2.out.vcf/.tbi/.stats with
GATK4_LEARNREADORIENTATIONMODEL.out.artifactprior (e.g., use leftJoin or the
DSL's left outer join API) and ensure GATK4_FILTERMUTECTCALLS consumes a
fallback empty orientation-bias input when the artifactprior is missing.

---

Nitpick comments:
In `@modules/nf-core/gatk4/learnreadorientationmodel/tests/main.nf.test`:
- Around line 19-20: The test is wrapping the f1r2 artifact in an extra
one-element list, which differs from the real upstream binding shape
(GATK4_MUTECT2.out.f1r2) that emits a single file; update the test so the second
element is the file path itself instead of a list (i.e., change the
[file(params.modules_testdata_base_path + ...)] entry to
file(params.modules_testdata_base_path + ...) in the input[0] assignment), and
make the same change at the other occurrence (lines noted as also applies to
43-44).

In `@workflows/twistcgp.nf`:
- Around line 133-165: The GATK steps GATK4_LEARNREADORIENTATIONMODEL,
GATK4_GETPILEUPSUMMARIES, and GATK4_CALCULATECONTAMINATION are not contributing
their versions_gatk4 outputs into ch_versions, so their versions are omitted
from twistcgp_software_mqc_versions.yml; update the workflow to push or merge
each process' versions_gatk4 channel into ch_versions (for example, after
calling GATK4_LEARNREADORIENTATIONMODEL(...), append ch_versions <<
GATK4_LEARNREADORIENTATIONMODEL.out.versions_gatk4, and do the same for
GATK4_GETPILEUPSUMMARIES.out.versions_gatk4 and
GATK4_CALCULATECONTAMINATION.out.versions_gatk4) so the versions aggregator
collects them.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c66e0a79-57e1-460b-b099-58641d4811bb

📥 Commits

Reviewing files that changed from the base of the PR and between 1b8f494 and 879ea26.

⛔ Files ignored due to path filters (3)

modules/nf-core/gatk4/calculatecontamination/tests/main.nf.test.snap is excluded by !**/*.snap
modules/nf-core/gatk4/getpileupsummaries/tests/main.nf.test.snap is excluded by !**/*.snap
modules/nf-core/gatk4/learnreadorientationmodel/tests/main.nf.test.snap is excluded by !**/*.snap

📒 Files selected for processing (17)

conf/modules.config
modules.json
modules/nf-core/gatk4/calculatecontamination/environment.yml
modules/nf-core/gatk4/calculatecontamination/main.nf
modules/nf-core/gatk4/calculatecontamination/meta.yml
modules/nf-core/gatk4/calculatecontamination/tests/main.nf.test
modules/nf-core/gatk4/calculatecontamination/tests/nextflow.config
modules/nf-core/gatk4/getpileupsummaries/environment.yml
modules/nf-core/gatk4/getpileupsummaries/main.nf
modules/nf-core/gatk4/getpileupsummaries/meta.yml
modules/nf-core/gatk4/getpileupsummaries/tests/main.nf.test
modules/nf-core/gatk4/learnreadorientationmodel/environment.yml
modules/nf-core/gatk4/learnreadorientationmodel/main.nf
modules/nf-core/gatk4/learnreadorientationmodel/meta.yml
modules/nf-core/gatk4/learnreadorientationmodel/tests/main.nf.test
modules/nf-core/gatk4/learnreadorientationmodel/tests/nextflow.config
workflows/twistcgp.nf

coderabbitai · 2026-04-01T23:06:42Z

+    withName: GATK4_GETPILEUPSUMMARIES {
+        ext.prefix = { "${meta.id}.mutect2.artifactprior" }
+        publishDir = [
+            path: { "${params.outdir}/${meta.id}" },
+            mode: params.publish_dir_mode,
+            pattern: "*{gz}",
+        ]
+    }


⚠️ Potential issue | 🟠 Major

PublishDir pattern won't match output files.

GATK4_GETPILEUPSUMMARIES outputs *.pileups.table (per context snippet 3), but pattern *{gz} won't match. Nothing will be published.

Also, the prefix mutect2.artifactprior seems misnamed — this module generates pileup summaries, not artifact priors.

Proposed fix

withName: GATK4_GETPILEUPSUMMARIES { - ext.prefix = { "${meta.id}.mutect2.artifactprior" } + ext.prefix = { "${meta.id}.mutect2" } publishDir = [ path: { "${params.outdir}/${meta.id}" }, mode: params.publish_dir_mode, - pattern: "*{gz}", + pattern: "*{table}", ] }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@conf/modules.config` around lines 125 - 132, The publishDir for withName: GATK4_GETPILEUPSUMMARIES is using a wrong pattern and a misleading ext.prefix; change ext.prefix from "mutect2.artifactprior" to something like "mutect2.pileups" or "mutect2.pileup_summaries" and update publishDir.pattern to match the actual outputs (e.g. use "*.pileups.table*" or a stricter glob like "*.{pileups.table,pileups.table.gz}") so the generated *.pileups.table (and optional gzipped variants) are picked up and published.

coderabbitai · 2026-04-01T23:06:42Z

+    stub:
+    prefix = task.ext.prefix ?: "${meta.id}"
+    """
+    touch ${prefix}.contamination.table
+    touch ${prefix}.segmentation.table
+    """


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n modules/nf-core/gatk4/calculatecontamination/main.nf

Repository: fulcrumgenomics/twistcgp

Length of output: 2163

Align stub output with optional segmentation behavior.

The real process declares segmentation as optional: true (line 15) because GATK only creates it with --tumor-segmentation. The stub unconditionally touches both files, which doesn't match and can mask missing-output bugs in stub tests.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modules/nf-core/gatk4/calculatecontamination/main.nf` around lines 43 - 48, The stub currently always creates both "${prefix}.contamination.table" and "${prefix}.segmentation.table" which mismatches the process' optional segmentation output; update the stub block so it always touches "${prefix}.contamination.table" but only touches "${prefix}.segmentation.table" when the tumor segmentation option was requested (e.g. check an extension flag such as task.ext.tumor_segmentation or task.ext.tumorSegmentation that corresponds to the --tumor-segmentation flag), leaving the rest of the stub and the prefix assignment unchanged.

coderabbitai · 2026-04-01T23:06:42Z

+  contamination:
+    - - meta:
+          type: file
+          description: File containing the contamination table.
+          pattern: "*.contamination.table"
+          ontologies: []
+      - "*.contamination.table":
+          type: file
+          description: File containing the contamination table.
+          pattern: "*.contamination.table"
+          ontologies: []
+  segmentation:
+    - - meta:
+          type: file
+          description: File containing the contamination table.
+          pattern: "*.contamination.table"
+          ontologies: []
+      - "*.segmentation.table":
+          type: file
+          description: output table containing segmentation of tumor minor allele fractions
+            (optional)
+          pattern: "*.segmentation.table"
+          ontologies: []


⚠️ Potential issue | 🟡 Minor

❓ Verification inconclusive

Script executed:

# First, find the file and read the specified lines fd -type f -name "meta.yml" | grep gatk4/calculatecontamination

Repository: fulcrumgenomics/twistcgp

Repository: fulcrumgenomics/twistcgp
Exit code: 1

stderr:

error: 'ype' isn't a valid value for '--type <filetype>' [possible values: file, directory, symlink, executable, empty, socket, pipe] For more information try '--help'

Script executed:

# Read the specific file content at lines 41-63 cat -n modules/nf-core/gatk4/calculatecontamination/meta.yml | sed -n '41,63p'

Repository: fulcrumgenomics/twistcgp

Repository: fulcrumgenomics/twistcgp
Exit code: 0

stdout:

41 contamination: 42 - - meta: 43 type: file 44 description: File containing the contamination table. 45 pattern: "*.contamination.table" 46 ontologies: [] 47 - "*.contamination.table": 48 type: file 49 description: File containing the contamination table. 50 pattern: "*.contamination.table" 51 ontologies: [] 52 segmentation: 53 - - meta: 54 type: file 55 description: File containing the contamination table. 56 pattern: "*.contamination.table" 57 ontologies: [] 58 - "*.segmentation.table": 59 type: file 60 description: output table containing segmentation of tumor minor allele fractions 61 (optional) 62 pattern: "*.segmentation.table" 63 ontologies: []

nf-core meta.yml output metadata structure conventions tuple format

The nf-core meta.yml file in modules documents module metadata including tool name, description, keywords, input/output channels, and authors. It follows a JSON schema defined in nf-core/modules repository at modules/meta-schema.json (linted by nf-core/tools). Updated structure (as of 2025) groups inputs/outputs by channel, distinguishing tuples (lists) from single elements. Tuple format: Lists represent tuple channels matching Nextflow input/output like tuple val(meta), path(files). Single channels are direct mappings. Example from bwa/mem (2025 structure): input: - # Tuple channel 1 - meta: type: map description: Groovy Map containing sample information - reads: type: file description: List of input FastQ files... - # Tuple channel 2 - meta2: type: map - index: type: file description: BWA genome index files pattern: ".{amb,ann,bwt,pac,sa}" - sort_bam: # Single val channel type: boolean description: use samtools sort (true) or samtools view (false) pattern: "true or false" output: - bam: # Tuple output - - meta: type: file # Note: likely typo, should be map - ".bam": type: file - versions: # Single - versions.yml: type: file description: File containing software versions pattern: "versions.yml" Channels use type (map/file/boolean etc.), description, pattern, optional ontologies. Linting expects consistency with main.nf (e.g., tuple val(meta) path(bar) -> list with meta map + bar file). Use nf-core modules lint --fix for updates. For pipelines, modules.json tracks versions.

Citations:

1: https://nf-co.re/docs/nf-core-tools/api_reference/dev/module_lint_tests/meta_yml

2: https://nf-co.re/blog/2025/modules-meta-yml

3: https://github.com/nf-core/modules/

Fix output metadata structure.

Both meta entries are incorrectly marked as type: file (should be type: map—meta is a Groovy Map). Additionally, the segmentation output's meta block still references the contamination table description and pattern; it should describe the segmentation metadata instead.

Example of issues at lines 42-46 and 53-57

contamination: - - meta: type: file # Should be type: map description: File containing the contamination table. pattern: "*.contamination.table" ontologies: [] segmentation: - - meta: type: file # Should be type: map description: File containing the contamination table. # Wrong: still says contamination pattern: "*.contamination.table" # Wrong: still says contamination pattern ontologies: []

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modules/nf-core/gatk4/calculatecontamination/meta.yml` around lines 41 - 63, Update the outputs' meta entries so they are Groovy maps instead of files and fix the segmentation meta text/pattern: change the meta block under both contamination and segmentation from "type: file" to "type: map" and adjust the segmentation meta description and pattern to reference the segmentation output (e.g., describe "segmentation of tumor minor allele fractions" and use the "*.segmentation.table" pattern) so the "meta" map for contamination refers to contamination metadata and the "meta" map for segmentation refers to segmentation metadata.

coderabbitai · 2026-04-01T23:06:42Z

+keywords:
+  - gatk4
+  - germlinevariantsites
+  - getpileupsumaries


⚠️ Potential issue | 🟡 Minor

Typo in keyword.

getpileupsumaries → getpileupsummaries

Proposed fix

- - getpileupsumaries + - getpileupsummaries

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- getpileupsumaries

- getpileupsummaries

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modules/nf-core/gatk4/getpileupsummaries/meta.yml` at line 7, The workflow metadata contains a typo: replace the incorrect keyword string "getpileupsumaries" with the correct "getpileupsummaries" in meta.yml so the tool name matches expected naming; update any occurrences of getpileupsumaries (e.g., the entry currently present in meta.yml) to getpileupsummaries to ensure consistency.

coderabbitai · 2026-04-01T23:06:42Z

+        with a primary focus on variant discovery and genotyping. Its powerful processing engine
+        and high-performance computing features make it capable of taking on projects of any size.
+      homepage: https://gatk.broadinstitute.org/hc/en-us
+      documentation: https://gatk.broadinstitute.org/hc/en-us/categories/360002369672s


⚠️ Potential issue | 🟡 Minor

Potentially malformed documentation URL.

URL ends with 360002369672s — trailing 's' looks like a typo.

#!/bin/bash # Verify if the documentation URL is valid curl -sL -o /dev/null -w "%{http_code}" "https://gatk.broadinstitute.org/hc/en-us/categories/360002369672s" echo "" curl -sL -o /dev/null -w "%{http_code}" "https://gatk.broadinstitute.org/hc/en-us/categories/360002369672"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modules/nf-core/gatk4/getpileupsummaries/meta.yml` at line 16, The documentation URL value in meta.yml appears malformed (the documentation: "https://gatk.broadinstitute.org/hc/en-us/categories/360002369672s"); update the documentation field in modules/nf-core/gatk4/getpileupsummaries/meta.yml to the correct URL (likely remove the trailing 's' to "https://gatk.broadinstitute.org/hc/en-us/categories/360002369672" or replace with the verified, intended documentation link), and run a quick curl check to confirm the corrected URL returns a 200 before committing.

coderabbitai · 2026-04-01T23:06:43Z

+    script:
+    def args = task.ext.args ?: ''
+    def prefix = task.ext.prefix ?: "${meta.id}"
+    def input_list = f1r2.collect { f1r2_ -> "--input ${f1r2_}" }.join(' ')


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== Locate producer/consumer contracts for f1r2 payload ==" rg -nP 'f1r2|LEARNREADORIENTATIONMODEL|GATK4_MUTECT2' --type-add 'nf:*.nf' -t nf -C3

Repository: fulcrumgenomics/twistcgp

Length of output: 6053

🏁 Script executed:

# Check if there are tests for this module fd -type f -name '*test*' -o -name '*spec*' | grep -i learnreadorientationmodel # Look for actual workflow test execution find . -type f -name '*.nf' -path '*/test*' | head -20 # Check if there's nextflow config showing how f1r2 is passed fd -type f -name 'nextflow.config' -o -name 'conf' | xargs grep -l f1r2 2>/dev/null || echo "No config found"

Repository: fulcrumgenomics/twistcgp

Length of output: 442

🏁 Script executed:

# Check the full LEARNREADORIENTATIONMODEL module to see entire script context cat -n modules/nf-core/gatk4/learnreadorientationmodel/main.nf

Repository: fulcrumgenomics/twistcgp

Length of output: 1956

🏁 Script executed:

# Verify Groovy Path behavior - check if there's any wrapper or extension rg -n 'metaClass|extensions|Path\.' modules/nf-core/gatk4/learnreadorientationmodel/ -A2 -B2

Repository: fulcrumgenomics/twistcgp

Length of output: 50

🏁 Script executed:

# Check for CI configuration find . -type f $ -name '*.yml' -o -name '*.yaml' $ -path '*/.github/*' | head -10

Repository: fulcrumgenomics/twistcgp

Length of output: 174

🏁 Script executed:

# Look for test execution or sample invocation find . -type f -name '*.nf' | xargs grep -l 'GATK4_LEARNREADORIENTATIONMODEL' | head -5

Repository: fulcrumgenomics/twistcgp

Length of output: 149

🏁 Script executed:

# Check if there's documentation about input handling in Nextflow for this pattern cat -n modules/nf-core/gatk4/mutect2/main.nf | sed -n '1,30p'

Repository: fulcrumgenomics/twistcgp

Length of output: 1378

🏁 Script executed:

# Check the CI workflow to see if it tests the module cat -n ./.github/workflows/twistgp_ci.yml | head -50

Repository: fulcrumgenomics/twistcgp

Length of output: 1971

🏁 Script executed:

# Check if there are any Nextflow docs or comments about path() binding behavior rg -n 'path$' modules/nf-core/gatk4/learnreadorientationmodel/ -B2 -A2

Repository: fulcrumgenomics/twistcgp

Length of output: 1376

🏁 Script executed:

# Search for similar patterns in the codebase - other modules using path() with .collect() rg -n 'path\([^)]*$.*\.collect' --type nf -A1 -B1

Repository: fulcrumgenomics/twistcgp

Length of output: 94

🏁 Script executed:

# Look at the test file to understand what input is being passed cat -n modules/nf-core/gatk4/learnreadorientationmodel/tests/main.nf.test

Repository: fulcrumgenomics/twistcgp

Length of output: 2012

🏁 Script executed:

# Check what the test data looks like find modules/nf-core/gatk4/learnreadorientationmodel/tests -type f | head -20

Repository: fulcrumgenomics/twistcgp

Length of output: 276

🏁 Script executed:

# Look for Nextflow documentation or similar modules that handle optional multi-file inputs rg -n 'collect\s*{' modules/nf-core/gatk4/ --type nf | head -20

Repository: fulcrumgenomics/twistcgp

Length of output: 153

🏁 Script executed:

# Look for .collect patterns in nf files rg -n '\.collect' modules/nf-core/gatk4/ -A1 -B1

Repository: fulcrumgenomics/twistcgp

Length of output: 4504

🏁 Script executed:

# Check how the test is actually structured - look at lines 19-20 more carefully # The test passes a LIST with a single file element wrapped in square brackets python3 << 'EOF' test_input = "[ [ id:'test' ], [file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/gatk/paired_mutect2_calls/test_test2_paired_mutect2_calls.f1r2.tar.gz', checkIfExists: true)] ]" print("Test Input Structure:") print("Outer tuple: [meta_map, f1r2_binding]") print("f1r2_binding: [file(...)] <- This is a LIST with one element") print("\nSo f1r2 receives a LIST, not a single Path object") print("Therefore .collect() on a List WILL work in Groovy") EOF

Repository: fulcrumgenomics/twistcgp

Length of output: 291

🏁 Script executed:

# Check the actual workflow invocation and what MUTECT2 emits cat -n workflows/twistcgp.nf | sed -n '130,140p'

Repository: fulcrumgenomics/twistcgp

Length of output: 694

🏁 Script executed:

# Verify MUTECT2 output definition cat -n modules/nf-core/gatk4/mutect2/main.nf | sed -n '20,30p'

Repository: fulcrumgenomics/twistcgp

Length of output: 468

🏁 Script executed:

# Look at Nextflow semantics: does path() binding with a list of files vs single file differ? # Search for other modules that handle this pattern rg -n 'tuple val$meta$, path\(' modules/nf-core/gatk4/ -A3 | grep -A3 'learnreadorientationmodel'

Repository: fulcrumgenomics/twistcgp

Length of output: 938

🏁 Script executed:

# Key question: in the test, why is the f1r2 wrapped in brackets [file(...)]? # Let's check if this is the actual expected input structure or a test artifact cat -n modules/nf-core/gatk4/learnreadorientationmodel/tests/main.nf.test | sed -n '14,25p'

Repository: fulcrumgenomics/twistcgp

Length of output: 550

🏁 Script executed:

# Understand Nextflow's path() binding behavior by checking Nextflow documentation patterns # Look at how filtermutectcalls handles multiple file inputs cat -n modules/nf-core/gatk4/filtermutectcalls/main.nf | sed -n '1,50p'

Repository: fulcrumgenomics/twistcgp

Length of output: 2571

🏁 Script executed:

# Check if there's a difference between how the test constructs input vs actual workflow # The test wraps in brackets [file(...)] but MUTECT2 emits single file # Verify what MUTECT2.out.f1r2 actually is at runtime rg -n 'f1r2' workflows/twistcgp.nf -B2 -A2

Repository: fulcrumgenomics/twistcgp

Length of output: 339

🏁 Script executed:

# Critical check: Verify Nextflow path() binding behavior with single vs multiple files # Look at the actual MUTECT2 output pattern and how it's bound rg -n 'path\(".*\*' modules/nf-core/gatk4/mutect2/main.nf | grep -i f1r2

Repository: fulcrumgenomics/twistcgp

Length of output: 141

🏁 Script executed:

# Check if there are runtime tests that actually execute the workflow cat -n ./.github/workflows/twistgp_ci.yml | sed -n '40,80p'

Repository: fulcrumgenomics/twistcgp

Length of output: 1719

🏁 Script executed:

# Verify the semantic difference: test wraps in brackets but workflow passes bare channel # Look for similar patterns in filtermutectcalls test cat -n modules/nf-core/gatk4/filtermutectcalls/tests/main.nf.test | head -60

Repository: fulcrumgenomics/twistcgp

Length of output: 2982

🏁 Script executed:

# Critical finding: compare how the test constructs input vs workflow reality # In filtermutectcalls, optional files that could be lists are passed as [] or [file] # Let's check the actual Nextflow semantics documentation pattern # Look at how LEARNREADORIENTATIONMODEL is actually invoked vs how test constructs it python3 << 'EOF' print("=== CRITICAL MISMATCH ===\n") print("MUTECT2 Output:") print(' tuple val(meta), path("*.f1r2.tar.gz"), optional: true, emit: f1r2') print(" -> Emits SINGLE file per sample\n") print("LEARNREADORIENTATIONMODEL Input:") print(' tuple val(meta), path(f1r2)') print(" -> Expects SINGLE Path object\n") print("Workflow invocation:") print(" GATK4_LEARNREADORIENTATIONMODEL(GATK4_MUTECT2.out.f1r2)") print(" -> Passes single file directly\n") print("Test construction:") print(' input[0] = [ [ id:"test" ], [file(...)] ]') print(" -> MANUALLY WRAPS in brackets to create a List\n") print("=== ISSUE ===") print("In workflow: f1r2 is a Path object") print("In test: f1r2 is a List with one Path object") print("\nPath.collect() -> NoSuchMethodError (Path lacks collect method)") print("List.collect() -> Works fine\n") print("Conclusion: Code is BROKEN in workflow, only works in test due to wrapping") EOF

Repository: fulcrumgenomics/twistcgp

Length of output: 800

Fix .collect() call on f1r2 — it receives a single Path object, not a List.

MUTECT2 emits a single file per sample (path("*.f1r2.tar.gz")), which binds as a Path object in this process. Calling .collect() directly on a Path will fail with NoSuchMethodError at runtime. The test passes only because it manually wraps the input in brackets, unlike the actual workflow.

Proposed fix

- def input_list = f1r2.collect { f1r2_ -> "--input ${f1r2_}" }.join(' ') + def f1r2_files = (f1r2 instanceof List ? f1r2 : [f1r2]) + def input_list = f1r2_files.collect { f1r2_ -> "--input ${f1r2_}" }.join(' ')

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def input_list = f1r2.collect { f1r2_ -> "--input ${f1r2_}" }.join(' ')

def f1r2_files = (f1r2 instanceof List ? f1r2 : [f1r2])

def input_list = f1r2_files.collect { f1r2_ -> "--input ${f1r2_}" }.join(' ')

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modules/nf-core/gatk4/learnreadorientationmodel/main.nf` at line 23, input_list is built by calling .collect() on f1r2 which is a single Path (not a List) and will throw NoSuchMethodError at runtime; change the construction of input_list in the learnreadorientationmodel process to handle both single Path and list cases (e.g., detect if f1r2 is a List and join via .collect(...).join(' '), otherwise format a single "--input ${f1r2}" string) so it works when MUTECT2 binds a single file as a Path.

coderabbitai · 2026-04-01T23:06:43Z

+  artifactprior:
+    - - meta:
+          type: file
+          description: file containing artifact-priors to be used by filtermutectcalls
+          pattern: "*.tar.gz"
+          ontologies:
+            - edam: http://edamontology.org/format_3989 # GZIP format
+      - "*.tar.gz":
+          type: file
+          description: file containing artifact-priors to be used by filtermutectcalls
+          pattern: "*.tar.gz"
+          ontologies:
+            - edam: http://edamontology.org/format_3989 # GZIP format


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, locate the file and examine full context find . -name "meta.yml" -path "*/gatk4/learnreadorientationmodel/*" -type f

Repository: fulcrumgenomics/twistcgp

Length of output: 126

🏁 Script executed:

# Search for other similar tuple output patterns in the codebase to understand conventions rg -A 10 "^\s+- -\s*meta:" --type yaml | head -100

Repository: fulcrumgenomics/twistcgp

Length of output: 6704

🌐 Web query:

nf-core meta.yml tuple output conventions file metadata

💡 Result:

In nf-core modules, the meta.yml file documents the module's inputs and outputs, with specific conventions for tuple-based channels commonly used in Nextflow processes. Tuple channels in main.nf, like tuple val(meta), path(reads), are represented in meta.yml as YAML lists. Each list item describes a tuple element: - First item: meta (type: map) - Groovy map with sample info (e.g., id, single_end). - Subsequent items: file paths (type: file), with description, pattern (e.g., ".fastq"), optional EDAM ontology. Example from bwa/mem module (updated structure, 2025): input: - - meta: type: map description: Groovy Map containing sample information - reads: type: file description: List of input FastQ files... pattern: ".{fastq.gz}" Single-element channels (e.g., val sort_bam or path "versions.yml") are not lists. Outputs follow the same pattern, grouped by emit name (e.g., bam: [...tuple elements...]). This structure groups by channel, distinguishes tuples (lists) from singles, validates against nf-core/modules meta-schema.json, and aids discoverability on nf-core site. Use nf-core modules lint --fix to update old flat structures. Schema at https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json (referenced in lint tests).

Citations:

1: Fix meta.yml structure for input and output tuples nf-core/modules#4983

2: https://nf-co.re/blog/2025/modules-meta-yml

3: https://nf-co.re/docs/nf-core-tools/api_reference/dev/module_lint_tests/meta_yml

4: https://nf-co.re/docs/tutorials/nf-core_training/writing-nf-core-modules/chapter-5-writing-your-module

5: http://nf-co.re/docs/tutorials/nf-core_training/writing-nf-core-modules/chapter-5-writing-your-module

6: https://github.com/nf-core/modules/

Fix meta entry in artifactprior output block.

The meta tuple slot must be type: map, not type: file. File attributes (pattern, ontologies) belong on the file entry only, not on meta. Use type: map with a generic description for the metadata element and move all file-specific attributes to the file entry below it, per nf-core conventions.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modules/nf-core/gatk4/learnreadorientationmodel/meta.yml` around lines 31 - 43, The artifactprior output block is incorrect: change the nested meta tuple entry from "type: file" to "type: map" and give it a generic metadata description, and remove file-specific keys (pattern, ontologies) from that meta map; ensure the file entry for "*.tar.gz" (the second tuple element) retains the file-specific attributes (type: file, pattern: "*.tar.gz", description: ..., ontologies: [...]) so all file attributes live only on the "*.tar.gz" file entry while meta becomes a simple map describing the metadata.

coderabbitai · 2026-04-01T23:06:43Z

+                { assert snapshot(
+					path(process.out.artifactprior[0][1]).linesGzip[3..7],
+                    process.out.findAll { key, val -> key.startsWith("versions") }
+                ).match() }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "main.nf.test" -path "*learnreadorientationmodel*"

Repository: fulcrumgenomics/twistcgp

Length of output: 136

🏁 Script executed:

cat -n ./modules/nf-core/gatk4/learnreadorientationmodel/tests/main.nf.test

Repository: fulcrumgenomics/twistcgp

Length of output: 2012

🏁 Script executed:

find . -name "main.nf" -path "*learnreadorientationmodel*" -type f

Repository: fulcrumgenomics/twistcgp

Length of output: 125

🏁 Script executed:

cat -n ./modules/nf-core/gatk4/learnreadorientationmodel/main.nf

Repository: fulcrumgenomics/twistcgp

Length of output: 1956

🏁 Script executed:

web_search nf-test linesGzip tarball tar.gz

Repository: fulcrumgenomics/twistcgp

Length of output: 164

🌐 Web query:

nf-test linesGzip documentation tarball

💡 Result:

linesGzip is a property in nf-test (Nextflow testing framework) that extends the Path object to read lines from gzip-compressed files without decompressing them to disk. It returns an array of lines for assertions in tests. Key features from official nf-test documentation: - Read all lines: path(process.out.out_ch.get(0)).linesGzip - Check size: assert path(...).linesGzip.size == 5 - Check contains: assert path(...).linesGzip.contains("Line Content") - Slice/filter: def lines = path(...).linesGzip[0..5]; assert lines.size == 6 - Grep specific lines efficiently (for large files): path(...).grepLinesGzip(0,5) or path(...).grepLineGzip(0) - Snapshot support: assert snapshot(path(...).linesGzip[0]) Note: For gzip files, the md5 property computes checksum after gunzipping, unlike non-gz files. No specific "tarball" is mentioned; tests typically use individual .gz files from repositories like nf-core/test-datasets. Tarballs may appear in Nextflow workflows (e.g., for bundled data), but linesGzip handles single .gz files.

Citations:

1: https://www.nf-test.com/docs/assertions/files/

2: https://nf-co.re/docs/contributing/nf-test/assertions

Don't snapshot a tarball with linesGzip.

The process outputs a .tar.gz archive, but linesGzip is designed for gzip-compressed text files. Using it on a tarball reads binary tar stream bytes as "lines," which is meaningless and brittle. Snapshot the path itself instead, or assert on extracted contents if validation of archive structure is needed.

Suggested fix

- { assert snapshot( - path(process.out.artifactprior[0][1]).linesGzip[3..7], - process.out.findAll { key, val -> key.startsWith("versions") } - ).match() } + { assert snapshot( + path(process.out.artifactprior[0][1]), + process.out.findAll { key, val -> key.startsWith("versions") } + ).match() }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

{ assert snapshot(

path(process.out.artifactprior[0][1]).linesGzip[3..7],

process.out.findAll { key, val -> key.startsWith("versions") }

).match() }

{ assert snapshot(

path(process.out.artifactprior[0][1]),

process.out.findAll { key, val -> key.startsWith("versions") }

).match() }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modules/nf-core/gatk4/learnreadorientationmodel/tests/main.nf.test` around lines 28 - 31, The test is incorrectly using linesGzip on a .tar.gz tarball; change the assertion to snapshot the archive path itself or validate extracted contents instead: replace the snapshot call that references path(process.out.artifactprior[0][1]).linesGzip[3..7] with either snapshot(path(process.out.artifactprior[0][1])) to record the tarball, or extract the tarball in the test and assert on the extracted files' contents (e.g., list or specific file contents) when calling snapshot; update the test that constructs the snapshot (the assertion block containing process.out.artifactprior and process.out.findAll) accordingly.

coderabbitai · 2026-04-01T23:06:43Z

+    ch_mutect2_samples = GATK4_MUTECT2.out.vcf
        .join(GATK4_MUTECT2.out.tbi)
        .join(GATK4_MUTECT2.out.stats)
-        .map { meta, vcf, tbi, stats ->
+        .join(GATK4_LEARNREADORIENTATIONMODEL.out.artifactprior)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find and read the relevant file find . -name "twistcgp.nf" -type f

Repository: fulcrumgenomics/twistcgp

Length of output: 91

🏁 Script executed:

# Read the file around lines 171-174 if [ -f "workflows/twistcgp.nf" ]; then sed -n '165,180p' "workflows/twistcgp.nf" fi

Repository: fulcrumgenomics/twistcgp

Length of output: 713

🏁 Script executed:

# Search for GATK4_LEARNREADORIENTATIONMODEL output definitions rg "GATK4_LEARNREADORIENTATIONMODEL" -A 5 -B 5 --max-count=20

Repository: fulcrumgenomics/twistcgp

Length of output: 8446

🏁 Script executed:

# Check if there are other join patterns in the file with remainder or optional handling rg "\.join\(" "workflows/twistcgp.nf" -A 2 -B 2

Repository: fulcrumgenomics/twistcgp

Length of output: 1708

🏁 Script executed:

# Check the GATK4_LEARNREADORIENTATIONMODEL module definition find . -path "*gatk4/learnreadorientationmodel/main.nf" -exec cat {} \;

Repository: fulcrumgenomics/twistcgp

Length of output: 1634

🏁 Script executed:

# Check if GATK4_MUTECT2 has conditional f1r2 output handling find . -path "*gatk4/mutect2/main.nf" -exec head -100 {} \;

Repository: fulcrumgenomics/twistcgp

Length of output: 2695

🏁 Script executed:

# Search for any conditional logic around GATK4_LEARNREADORIENTATIONMODEL execution rg "GATK4_LEARNREADORIENTATIONMODEL" -B 10 -A 5 "workflows/twistcgp.nf"

Repository: fulcrumgenomics/twistcgp

Length of output: 2605

🏁 Script executed:

# Check Nextflow join documentation and usage with remainder in the codebase rg "remainder\s*:" -B 2 -A 2

Repository: fulcrumgenomics/twistcgp

Length of output: 50

🏁 Script executed:

# Check if there are any examples of handling optional channels with join in the workflow rg "join.*optional" -B 3 -A 3

Repository: fulcrumgenomics/twistcgp

Length of output: 50

Fix inner join on optional artifact priors.

Mutect2 only produces f1r2 (marked optional: true) when enabled, so GATK4_LEARNREADORIENTATIONMODEL won't run for all samples. The default inner join drops unmatched items, causing samples without artifactprior to disappear before GATK4_FILTERMUTECTCALLS instead of falling back to empty orientation-bias input.

Suggested fix

ch_mutect2_samples = GATK4_MUTECT2.out.vcf .join(GATK4_MUTECT2.out.tbi) .join(GATK4_MUTECT2.out.stats) - .join(GATK4_LEARNREADORIENTATIONMODEL.out.artifactprior) + .join(GATK4_LEARNREADORIENTATIONMODEL.out.artifactprior, remainder: true) + .map { meta, vcf, tbi, stats, artifactprior -> tuple(meta, vcf, tbi, stats, artifactprior ?: []) }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

ch_mutect2_samples = GATK4_MUTECT2.out.vcf

.join(GATK4_MUTECT2.out.tbi)

.join(GATK4_MUTECT2.out.stats)

.map { meta, vcf, tbi, stats ->

.join(GATK4_LEARNREADORIENTATIONMODEL.out.artifactprior)

ch_mutect2_samples = GATK4_MUTECT2.out.vcf

.join(GATK4_MUTECT2.out.tbi)

.join(GATK4_MUTECT2.out.stats)

.join(GATK4_LEARNREADORIENTATIONMODEL.out.artifactprior, remainder: true)

.map { meta, vcf, tbi, stats, artifactprior -> tuple(meta, vcf, tbi, stats, artifactprior ?: []) }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@workflows/twistcgp.nf` around lines 171 - 174, The current chain uses an inner join that drops samples missing artifact priors: replace the final .join(GATK4_LEARNREADORIENTATIONMODEL.out.artifactprior) with a left-join variant so ch_mutect2_samples retains all GATK4_MUTECT2 entries and unmatched artifactprior values are passed as empty/null; specifically change the join operation used to combine GATK4_MUTECT2.out.vcf/.tbi/.stats with GATK4_LEARNREADORIENTATIONMODEL.out.artifactprior (e.g., use leftJoin or the DSL's left outer join API) and ensure GATK4_FILTERMUTECTCALLS consumes a fallback empty orientation-bias input when the artifactprior is missing.

coderabbitai · 2026-04-01T23:06:43Z

+    ch_filtermutect_in = params.population_germline_vcf
+        ? ch_mutect2_samples
+            .join(GATK4_CALCULATECONTAMINATION.out.segmentation)
+            .join(GATK4_CALCULATECONTAMINATION.out.contamination)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's locate and examine the workflow file cd workflows ls -la | head -20 wc -l twistcgp.nf

Repository: fulcrumgenomics/twistcgp

Length of output: 262

🏁 Script executed:

# Read the specific lines in context (lines 170-185 to get surrounding context) sed -n '170,185p' twistcgp.nf

Repository: fulcrumgenomics/twistcgp

Length of output: 122

🏁 Script executed:

# Search for GATK4_CALCULATECONTAMINATION definition to understand its outputs rg "GATK4_CALCULATECONTAMINATION" twistcgp.nf -B 2 -A 5 | head -50

Repository: fulcrumgenomics/twistcgp

Length of output: 119

🏁 Script executed:

# Look for the output definitions to understand if segmentation is optional rg "segmentation|contamination" twistcgp.nf -B 1 -A 1 | head -100

Repository: fulcrumgenomics/twistcgp

Length of output: 119

🏁 Script executed:

# Read the specific lines with full path from repo root sed -n '170,185p' workflows/twistcgp.nf

Repository: fulcrumgenomics/twistcgp

Length of output: 842

🏁 Script executed:

# Search for GATK4_CALCULATECONTAMINATION and its outputs rg "GATK4_CALCULATECONTAMINATION" workflows/ -B 2 -A 5

Repository: fulcrumgenomics/twistcgp

Length of output: 2159

🏁 Script executed:

# Look for the module definition or output structure rg "out\.(segmentation|contamination)" workflows/ -B 2 -A 2

Repository: fulcrumgenomics/twistcgp

Length of output: 557

🏁 Script executed:

# Find the GATK4_CALCULATECONTAMINATION module definition find . -name "*calculatecontamination*" -type f

Repository: fulcrumgenomics/twistcgp

Length of output: 50

🏁 Script executed:

# Check the modules directory structure find modules -name "*calculatecontamination*" -o -name "*CALCULATECONTAMINATION*" | head -20

Repository: fulcrumgenomics/twistcgp

Length of output: 112

🏁 Script executed:

# Search for output definitions in the module rg "segmentation" modules/ -B 2 -A 2

Repository: fulcrumgenomics/twistcgp

Length of output: 10333

🏁 Script executed:

# Verify the exact main.nf file for GATK4_CALCULATECONTAMINATION cat modules/nf-core/gatk4/calculatecontamination/main.nf

Repository: fulcrumgenomics/twistcgp

Length of output: 1820

Use remainder: true and coalesce missing segmentation to prevent silent sample loss.

The segmentation output is optional (marked in the module), but the join operations here are inner joins by default. When segmentation isn't emitted, matched samples are silently dropped and never reach GATK4_FILTERMUTECTCALLS. Apply remainder: true to the first join and coalesce missing segmentation to [].

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@workflows/twistcgp.nf` around lines 176 - 179, The current inner joins drop samples when GATK4_CALCULATECONTAMINATION.out.segmentation is not emitted; change the first join to be a remainder join (use remainder: true on the join between ch_mutect2_samples and GATK4_CALCULATECONTAMINATION.out.segmentation) and ensure missing segmentation values are coalesced to an empty list before the subsequent join to contamination/consumers (e.g., when unpacking the joined tuple for ch_filtermutect_in replace a null/undefined segmentation with []), so no samples are silently lost before GATK4_FILTERMUTECTCALLS.

emmcauley mentioned this pull request Mar 24, 2026

feature branch: nice-to-have features #84

Draft

4 tasks

feat: LEARNREADORIENTATIONMODEL and CALCULATE_CONTAMINATION

879ea26

emmcauley force-pushed the em_learn_model branch from 98ba159 to 879ea26 Compare March 24, 2026 19:53

coderabbitai Bot requested changes Apr 1, 2026

View reviewed changes

emmcauley added 2 commits May 6, 2026 19:49

fix: coderabbit review comments

087110a

chore: full git hash

462cf2b

emmcauley force-pushed the em_learn_model branch from 7008546 to 462cf2b Compare May 6, 2026 23:53

	def input_list = f1r2.collect { f1r2_ -> "--input ${f1r2_}" }.join(' ')
	def f1r2_files = (f1r2 instanceof List ? f1r2 : [f1r2])
	def input_list = f1r2_files.collect { f1r2_ -> "--input ${f1r2_}" }.join(' ')

Conversation

emmcauley commented Mar 24, 2026

Uh oh!

github-actions Bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

nf-core pipelines lint overall result: Passed ✅ ⚠️

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

Uh oh!

znorgaard commented Apr 1, 2026

Uh oh!

coderabbitai Bot commented Apr 1, 2026

Uh oh!

coderabbitai Bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

znorgaard commented Apr 1, 2026

Uh oh!

coderabbitai Bot commented Apr 1, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Mar 24, 2026 •

edited

Loading

`nf-core pipelines lint` overall result: Passed ✅ ⚠️

coderabbitai Bot commented Apr 1, 2026 •

edited

Loading