Skip to content

feat: LEARNREADORIENTATIONMODEL and CALCULATE_CONTAMINATION#88

Draft
emmcauley wants to merge 3 commits into
em_nice_to_havesfrom
em_learn_model
Draft

feat: LEARNREADORIENTATIONMODEL and CALCULATE_CONTAMINATION#88
emmcauley wants to merge 3 commits into
em_nice_to_havesfrom
em_learn_model

Conversation

@emmcauley
Copy link
Copy Markdown
Collaborator

This PR adds GATK4_CALCULATECONTAMINATION, GATK4_GETPILEUPSUMMARIES, and GATK4_LEARNREADORIENTATIONMODEL. These modules are in-line with best practices,especially if the submitted samples are FFPE (which are particularly sensitive to artifacts).

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 24, 2026

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 087110a

+| ✅  99 tests passed       |+
#| ❔  57 tests were ignored |#
!| ❗  14 tests had warnings |!
Details

❗ Test warnings:

  • files_exist - File not found: .github/workflows/awstest.yml
  • files_exist - File not found: .github/workflows/awsfulltest.yml
  • files_exist - File not found: ro-crate-metadata.json
  • readme - README did not have an nf-core template version badge.
  • readme - README contains the placeholder zenodo.XXXXXXX. This should be replaced with the zenodo doi (after the first release).
  • pipeline_todos - TODO string in README.md: Include a figure that guides the user through the major workflow steps. Many nf-core
  • pipeline_todos - TODO string in README.md: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file.
  • pipeline_todos - TODO string in nextflow.config: Specify any additional parameters here
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes
  • pipeline_todos - TODO string in base.config: Customise requirements for specific processes.
  • pipeline_todos - TODO string in main.nf.test: Once you have added the required tests, please run the following command to build this file:

❔ Tests ignored:

  • files_exist - File is ignored: .editorconfig
  • files_exist - File is ignored: .github/.dockstore.yml
  • files_exist - File is ignored: .github/CONTRIBUTING.md
  • files_exist - File is ignored: .github/ISSUE_TEMPLATE/bug_report.yml
  • files_exist - File is ignored: .github/ISSUE_TEMPLATE/config.yml
  • files_exist - File is ignored: .github/ISSUE_TEMPLATE/feature_request.yml
  • files_exist - File is ignored: .github/PULL_REQUEST_TEMPLATE.md
  • files_exist - File is ignored: .github/actions/get-shards/action.yml
  • files_exist - File is ignored: .github/actions/nf-test/action.yml
  • files_exist - File is ignored: .github/workflows/branch.yml
  • files_exist - File is ignored: .github/workflows/ci.yml
  • files_exist - File is ignored: .github/workflows/linting.yml
  • files_exist - File is ignored: .github/workflows/linting_comment.yml
  • files_exist - File is ignored: .github/workflows/nf-test.yml
  • files_exist - File is ignored: .prettierignore
  • files_exist - File is ignored: .prettierrc.yml
  • files_exist - File is ignored: CHANGELOG.md
  • files_exist - File is ignored: CITATIONS.md
  • files_exist - File is ignored: CODE_OF_CONDUCT.md
  • files_exist - File is ignored: LICENSE
  • files_exist - File is ignored: assets/email_template.html
  • files_exist - File is ignored: assets/email_template.txt
  • files_exist - File is ignored: assets/nf-core-twistcgp_logo_light.png
  • files_exist - File is ignored: assets/sendmail_template.txt
  • files_exist - File is ignored: conf/igenomes.config
  • files_exist - File is ignored: conf/igenomes_ignored.config
  • files_exist - File is ignored: conf/test_full.config
  • files_exist - File is ignored: docs/images/nf-core-twistcgp_logo_dark.png
  • files_exist - File is ignored: docs/images/nf-core-twistcgp_logo_light.png
  • files_exist - File is ignored: docs/output.md
  • files_exist - File is ignored: docs/README.md
  • files_exist - File is ignored: docs/usage.md
  • nextflow_config - nextflow_config
  • nf_test_content - nf_test_content
  • files_unchanged - File ignored due to lint config: CODE_OF_CONDUCT.md
  • files_unchanged - File ignored due to lint config: LICENSE or LICENSE.md or LICENCE or LICENCE.md
  • files_unchanged - File ignored due to lint config: .github/.dockstore.yml
  • files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
  • files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
  • files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/config.yml
  • files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/feature_request.yml
  • files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md
  • files_unchanged - File ignored due to lint config: .github/workflows/branch.yml
  • files_unchanged - File ignored due to lint config: .github/workflows/linting_comment.yml
  • files_unchanged - File ignored due to lint config: .github/workflows/linting.yml
  • files_unchanged - File does not exist: assets/email_template.html
  • files_unchanged - File ignored due to lint config: assets/email_template.txt
  • files_unchanged - File does not exist: assets/sendmail_template.txt
  • files_unchanged - File ignored due to lint config: assets/nf-core-twistcgp_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-twistcgp_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-twistcgp_logo_dark.png
  • files_unchanged - File ignored due to lint config: docs/README.md
  • files_unchanged - File ignored due to lint config: .gitignore or .prettierignore
  • actions_nf_test - actions_nf_test
  • actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/twistcgp/twistcgp/.github/workflows/awstest.yml
  • actions_awsfulltest - actions_awsfulltest
  • rocrate_readme_sync - rocrate_readme_sync

✅ Tests passed:

  • files_exist - File found: .gitattributes
  • files_exist - File found: .gitignore
  • files_exist - File found: .nf-core.yml
  • files_exist - File found: nextflow_schema.json
  • files_exist - File found: nextflow.config
  • files_exist - File found: README.md
  • files_exist - File found: conf/modules.config
  • files_exist - File found: conf/test.config
  • files_exist - File found: nf-test.config
  • files_exist - File found: tests/default.nf.test
  • files_exist - File found: main.nf
  • files_exist - File found: assets/multiqc_config.yml
  • files_exist - File found: conf/base.config
  • files_exist - File found: modules.json
  • files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
  • files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
  • files_exist - File not found check: .github/workflows/push_dockerhub.yml
  • files_exist - File not found check: .markdownlint.yml
  • files_exist - File not found check: .nf-core.yaml
  • files_exist - File not found check: .yamllint.yml
  • files_exist - File not found check: bin/markdown_to_html.r
  • files_exist - File not found check: conf/aws.config
  • files_exist - File not found check: docs/images/nf-core-twistcgp_logo.png
  • files_exist - File not found check: lib/Checks.groovy
  • files_exist - File not found check: lib/Completion.groovy
  • files_exist - File not found check: lib/NfcoreTemplate.groovy
  • files_exist - File not found check: lib/Utils.groovy
  • files_exist - File not found check: lib/Workflow.groovy
  • files_exist - File not found check: lib/WorkflowMain.groovy
  • files_exist - File not found check: lib/WorkflowTwistcgp.groovy
  • files_exist - File not found check: parameters.settings.json
  • files_exist - File not found check: pipeline_template.yml
  • files_exist - File not found check: Singularity
  • files_exist - File not found check: lib/nfcore_external_java_deps.jar
  • files_exist - File not found check: .travis.yml
  • files_unchanged - .gitattributes matches the template
  • files_unchanged - .prettierrc.yml matches the template
  • pipeline_if_empty_null - No ifEmpty(null) strings found
  • plugin_includes - No wrong validation plugin imports have been found
  • pipeline_name_conventions - Name adheres to nf-core convention
  • template_strings - Did not find any Jinja template strings (0 files)
  • schema_lint - Schema lint passed
  • schema_lint - Schema title + description lint passed
  • schema_lint - Input mimetype lint passed: 'text/csv'
  • schema_params - Schema matched params returned from nextflow config
  • system_exit - No System.exit calls found
  • actions_schema_validation - Workflow validation passed: linting_comment.yml
  • actions_schema_validation - Workflow validation passed: linting.yml
  • actions_schema_validation - Workflow validation passed: twistgp_ci.yml
  • merge_markers - No merge markers found in pipeline files
  • modules_json - Only installed modules found in modules.json
  • multiqc_config - assets/multiqc_config.yml found and not ignored.
  • multiqc_config - assets/multiqc_config.yml contains report_section_order
  • multiqc_config - assets/multiqc_config.yml contains export_plots
  • multiqc_config - assets/multiqc_config.yml contains report_comment
  • multiqc_config - assets/multiqc_config.yml follows the ordering scheme of the minimally required plugins.
  • multiqc_config - assets/multiqc_config.yml contains 'export_plots: true'.
  • modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'
  • local_component_structure - local subworkflows directory structure is correct 'subworkflows/local/TOOL/SUBTOOL'
  • base_config - conf/base.config found and not ignored.
  • modules_config - conf/modules.config found and not ignored.
  • modules_config - ALIGNBAM found in conf/modules.config and Nextflow scripts.
  • modules_config - BCFTOOLS_VIEW found in conf/modules.config and Nextflow scripts.
  • modules_config - BWAMEM2_INDEX found in conf/modules.config and Nextflow scripts.
  • modules_config - CIVICPY_UPDATE_CACHE found in conf/modules.config and Nextflow scripts.
  • modules_config - CIVICPY_ANNOTATE_VCF found in conf/modules.config and Nextflow scripts.
  • modules_config - CNVKIT_BATCH found in conf/modules.config and Nextflow scripts.
  • modules_config - ENSEMBLVEP_DOWNLOAD found in conf/modules.config and Nextflow scripts.
  • modules_config - ENSEMBLVEP_VEP found in conf/modules.config and Nextflow scripts.
  • modules_config - GATK4_MUTECT2 found in conf/modules.config and Nextflow scripts.
  • modules_config - GATK4_FILTERMUTECTCALLS found in conf/modules.config and Nextflow scripts.
  • modules_config - GATK4_CALCULATECONTAMINATION found in conf/modules.config and Nextflow scripts.
  • modules_config - GATK4_GETPILEUPSUMMARIES found in conf/modules.config and Nextflow scripts.
  • modules_config - GATK4_LEARNREADORIENTATIONMODEL found in conf/modules.config and Nextflow scripts.
  • modules_config - FASTP found in conf/modules.config and Nextflow scripts.
  • modules_config - FASTQC found in conf/modules.config and Nextflow scripts.
  • modules_config - FGBIO_FASTQTOBAM found in conf/modules.config and Nextflow scripts.
  • modules_config - MSISENSORPRO_SCAN found in conf/modules.config and Nextflow scripts.
  • modules_config - MSISENSOR2_MSI found in conf/modules.config and Nextflow scripts.
  • modules_config - MSISENSORPRO_PRO found in conf/modules.config and Nextflow scripts.
  • modules_config - PERBASE found in conf/modules.config and Nextflow scripts.
  • modules_config - PICARD found in conf/modules.config and Nextflow scripts.
  • modules_config - PICARD_COLLECTHSMETRICS found in conf/modules.config and Nextflow scripts.
  • modules_config - PICARD_COLLECTMULTIPLEMETRICS found in conf/modules.config and Nextflow scripts.
  • modules_config - PICARD_INTERVALLISTTOBED found in conf/modules.config and Nextflow scripts.
  • modules_config - PICARD_MARKDUPLICATES found in conf/modules.config and Nextflow scripts.
  • modules_config - SAMTOOLS_FAIDX found in conf/modules.config and Nextflow scripts.
  • modules_config - SAMTOOLS_DICT found in conf/modules.config and Nextflow scripts.
  • modules_config - SNPEFF_DOWNLOAD found in conf/modules.config and Nextflow scripts.
  • modules_config - SNPEFF_SNPEFF found in conf/modules.config and Nextflow scripts.
  • modules_config - TMB found in conf/modules.config and Nextflow scripts.
  • modules_config - MULTIQC found in conf/modules.config and Nextflow scripts.
  • modules_config - TWISTCGP found in conf/modules.config and Nextflow scripts.
  • modules_config - TABIX_POPULATION_GERMLINE found in conf/modules.config and Nextflow scripts.
  • modules_config - TABIX_PON found in conf/modules.config and Nextflow scripts.
  • modules_config - TABIX_COSMIC found in conf/modules.config and Nextflow scripts.
  • modules_config - TABIX_GNOMAD found in conf/modules.config and Nextflow scripts.
  • nfcore_yml - Repository type in .nf-core.yml is valid: pipeline
  • nfcore_yml - nf-core version in .nf-core.yml is set to the latest version: 3.3.2

Run details

  • nf-core/tools version 3.3.2
  • Run at 2026-05-06 23:51:47

@znorgaard
Copy link
Copy Markdown
Collaborator

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 1, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 1, 2026

📝 Walkthrough

Walkthrough

This pull request introduces three new GATK4 modules to the pipeline: GATK4_CALCULATECONTAMINATION, GATK4_GETPILEUPSUMMARIES, and GATK4_LEARNREADORIENTATIONMODEL. Each module includes configuration files, Conda environment definitions, Nextflow process implementations, metadata declarations, and test specifications. The workflow integrates these modules sequentially: GATK4_MUTECT2 output feeds into GATK4_LEARNREADORIENTATIONMODEL, which produces orientation bias artifacts; GATK4_GETPILEUPSUMMARIES generates pileup summaries from BAM/CRAM files; GATK4_CALCULATECONTAMINATION processes pileup summaries to estimate contamination. These outputs then flow into GATK4_FILTERMUTECTCALLS for variant filtering.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title references two of the three modules being added (LEARNREADORIENTATIONMODEL and CALCULATECONTAMINATION) but omits GETPILEUPSUMMARIES, which is equally important to the changeset.
Description check ✅ Passed The description accurately relates to the changeset by naming all three GATK4 modules being added and explaining their relevance for FFPE samples.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch em_learn_model

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@znorgaard
Copy link
Copy Markdown
Collaborator

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 1, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🧹 Nitpick comments (2)
workflows/twistcgp.nf (1)

133-165: Wire the new GATK steps into ch_versions.

These three process calls never feed their versions_gatk4 channels into ch_versions, so twistcgp_software_mqc_versions.yml will omit them.

Possible fix
+    ch_versions = ch_versions.mix(GATK4_LEARNREADORIENTATIONMODEL.out.versions_gatk4.first())
+    ch_versions = ch_versions.mix(GATK4_GETPILEUPSUMMARIES.out.versions_gatk4.first())
+    ch_versions = ch_versions.mix(GATK4_CALCULATECONTAMINATION.out.versions_gatk4.first())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@workflows/twistcgp.nf` around lines 133 - 165, The GATK steps
GATK4_LEARNREADORIENTATIONMODEL, GATK4_GETPILEUPSUMMARIES, and
GATK4_CALCULATECONTAMINATION are not contributing their versions_gatk4 outputs
into ch_versions, so their versions are omitted from
twistcgp_software_mqc_versions.yml; update the workflow to push or merge each
process' versions_gatk4 channel into ch_versions (for example, after calling
GATK4_LEARNREADORIENTATIONMODEL(...), append ch_versions <<
GATK4_LEARNREADORIENTATIONMODEL.out.versions_gatk4, and do the same for
GATK4_GETPILEUPSUMMARIES.out.versions_gatk4 and
GATK4_CALCULATECONTAMINATION.out.versions_gatk4) so the versions aggregator
collects them.
modules/nf-core/gatk4/learnreadorientationmodel/tests/main.nf.test (1)

19-20: Use the real upstream f1r2 shape in the test.

The workflow passes GATK4_MUTECT2.out.f1r2 directly here, and a path() output without arity emits a single file when one match is produced. These cases wrap that file in a one-element list, so the test is exercising a different binding shape than production. (nextflow.io)

Also applies to: 43-44

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/nf-core/gatk4/learnreadorientationmodel/tests/main.nf.test` around
lines 19 - 20, The test is wrapping the f1r2 artifact in an extra one-element
list, which differs from the real upstream binding shape
(GATK4_MUTECT2.out.f1r2) that emits a single file; update the test so the second
element is the file path itself instead of a list (i.e., change the
[file(params.modules_testdata_base_path + ...)] entry to
file(params.modules_testdata_base_path + ...) in the input[0] assignment), and
make the same change at the other occurrence (lines noted as also applies to
43-44).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@conf/modules.config`:
- Around line 125-132: The publishDir for withName: GATK4_GETPILEUPSUMMARIES is
using a wrong pattern and a misleading ext.prefix; change ext.prefix from
"mutect2.artifactprior" to something like "mutect2.pileups" or
"mutect2.pileup_summaries" and update publishDir.pattern to match the actual
outputs (e.g. use "*.pileups.table*" or a stricter glob like
"*.{pileups.table,pileups.table.gz}") so the generated *.pileups.table (and
optional gzipped variants) are picked up and published.

In `@modules/nf-core/gatk4/calculatecontamination/main.nf`:
- Around line 43-48: The stub currently always creates both
"${prefix}.contamination.table" and "${prefix}.segmentation.table" which
mismatches the process' optional segmentation output; update the stub block so
it always touches "${prefix}.contamination.table" but only touches
"${prefix}.segmentation.table" when the tumor segmentation option was requested
(e.g. check an extension flag such as task.ext.tumor_segmentation or
task.ext.tumorSegmentation that corresponds to the --tumor-segmentation flag),
leaving the rest of the stub and the prefix assignment unchanged.

In `@modules/nf-core/gatk4/calculatecontamination/meta.yml`:
- Around line 41-63: Update the outputs' meta entries so they are Groovy maps
instead of files and fix the segmentation meta text/pattern: change the meta
block under both contamination and segmentation from "type: file" to "type: map"
and adjust the segmentation meta description and pattern to reference the
segmentation output (e.g., describe "segmentation of tumor minor allele
fractions" and use the "*.segmentation.table" pattern) so the "meta" map for
contamination refers to contamination metadata and the "meta" map for
segmentation refers to segmentation metadata.

In `@modules/nf-core/gatk4/getpileupsummaries/meta.yml`:
- Line 7: The workflow metadata contains a typo: replace the incorrect keyword
string "getpileupsumaries" with the correct "getpileupsummaries" in meta.yml so
the tool name matches expected naming; update any occurrences of
getpileupsumaries (e.g., the entry currently present in meta.yml) to
getpileupsummaries to ensure consistency.
- Around line 72-83: The inputs `variants` and `variants_tbi` in meta.yml are
defined as single-item entries but should use the nested list/grouped format
used elsewhere (the "- - " pattern) to match nf-core tooling expectations;
update both `variants` and `variants_tbi` to be arrays of arrays (convert their
current single mapping into a nested list entry), preserving their keys (`type`,
`description`, `pattern`, `ontologies`) and values so the structure matches
other grouped inputs and avoids parsing errors.
- Line 16: The documentation URL value in meta.yml appears malformed (the
documentation:
"https://gatk.broadinstitute.org/hc/en-us/categories/360002369672s"); update the
documentation field in modules/nf-core/gatk4/getpileupsummaries/meta.yml to the
correct URL (likely remove the trailing 's' to
"https://gatk.broadinstitute.org/hc/en-us/categories/360002369672" or replace
with the verified, intended documentation link), and run a quick curl check to
confirm the corrected URL returns a 200 before committing.

In `@modules/nf-core/gatk4/learnreadorientationmodel/main.nf`:
- Line 14: The tuple declaration tuple val(meta), path("*.tar.gz"), emit:
artifactprior is too broad and catches staged files like *.f1r2.tar.gz; narrow
the glob to only the intended artifact filenames (for example replace "*.tar.gz"
with a more specific pattern such as "*.prior.tar.gz" or the exact artifact
basename used downstream) so that only the correct prior artifact files are
emitted as artifactprior.
- Line 15: The pipeline uses the Nextflow "topic:" syntax in the tuple emission
(the tuple line with topic: versions / emit: versions_gatk4) which requires
Nextflow 25.04+ or the preview flag; update the project constraint
(nextflowVersion) to allow >=25.04.0, or alternatively enable the preview
feature by adding the config key nextflow.preview.topic = true in the pipeline
configuration so the tuple with topic: versions works on older Nextflow
releases; locate the tuple emission (topic: versions) in
learnreadorientationmodel/main.nf and adjust either the nextflowVersion
constraint or the config to satisfy the requirement.
- Line 23: input_list is built by calling .collect() on f1r2 which is a single
Path (not a List) and will throw NoSuchMethodError at runtime; change the
construction of input_list in the learnreadorientationmodel process to handle
both single Path and list cases (e.g., detect if f1r2 is a List and join via
.collect(...).join(' '), otherwise format a single "--input ${f1r2}" string) so
it works when MUTECT2 binds a single file as a Path.

In `@modules/nf-core/gatk4/learnreadorientationmodel/meta.yml`:
- Around line 31-43: The artifactprior output block is incorrect: change the
nested meta tuple entry from "type: file" to "type: map" and give it a generic
metadata description, and remove file-specific keys (pattern, ontologies) from
that meta map; ensure the file entry for "*.tar.gz" (the second tuple element)
retains the file-specific attributes (type: file, pattern: "*.tar.gz",
description: ..., ontologies: [...]) so all file attributes live only on the
"*.tar.gz" file entry while meta becomes a simple map describing the metadata.

In `@modules/nf-core/gatk4/learnreadorientationmodel/tests/main.nf.test`:
- Around line 28-31: The test is incorrectly using linesGzip on a .tar.gz
tarball; change the assertion to snapshot the archive path itself or validate
extracted contents instead: replace the snapshot call that references
path(process.out.artifactprior[0][1]).linesGzip[3..7] with either
snapshot(path(process.out.artifactprior[0][1])) to record the tarball, or
extract the tarball in the test and assert on the extracted files' contents
(e.g., list or specific file contents) when calling snapshot; update the test
that constructs the snapshot (the assertion block containing
process.out.artifactprior and process.out.findAll) accordingly.

In `@workflows/twistcgp.nf`:
- Around line 176-179: The current inner joins drop samples when
GATK4_CALCULATECONTAMINATION.out.segmentation is not emitted; change the first
join to be a remainder join (use remainder: true on the join between
ch_mutect2_samples and GATK4_CALCULATECONTAMINATION.out.segmentation) and ensure
missing segmentation values are coalesced to an empty list before the subsequent
join to contamination/consumers (e.g., when unpacking the joined tuple for
ch_filtermutect_in replace a null/undefined segmentation with []), so no samples
are silently lost before GATK4_FILTERMUTECTCALLS.
- Around line 171-174: The current chain uses an inner join that drops samples
missing artifact priors: replace the final
.join(GATK4_LEARNREADORIENTATIONMODEL.out.artifactprior) with a left-join
variant so ch_mutect2_samples retains all GATK4_MUTECT2 entries and unmatched
artifactprior values are passed as empty/null; specifically change the join
operation used to combine GATK4_MUTECT2.out.vcf/.tbi/.stats with
GATK4_LEARNREADORIENTATIONMODEL.out.artifactprior (e.g., use leftJoin or the
DSL's left outer join API) and ensure GATK4_FILTERMUTECTCALLS consumes a
fallback empty orientation-bias input when the artifactprior is missing.

---

Nitpick comments:
In `@modules/nf-core/gatk4/learnreadorientationmodel/tests/main.nf.test`:
- Around line 19-20: The test is wrapping the f1r2 artifact in an extra
one-element list, which differs from the real upstream binding shape
(GATK4_MUTECT2.out.f1r2) that emits a single file; update the test so the second
element is the file path itself instead of a list (i.e., change the
[file(params.modules_testdata_base_path + ...)] entry to
file(params.modules_testdata_base_path + ...) in the input[0] assignment), and
make the same change at the other occurrence (lines noted as also applies to
43-44).

In `@workflows/twistcgp.nf`:
- Around line 133-165: The GATK steps GATK4_LEARNREADORIENTATIONMODEL,
GATK4_GETPILEUPSUMMARIES, and GATK4_CALCULATECONTAMINATION are not contributing
their versions_gatk4 outputs into ch_versions, so their versions are omitted
from twistcgp_software_mqc_versions.yml; update the workflow to push or merge
each process' versions_gatk4 channel into ch_versions (for example, after
calling GATK4_LEARNREADORIENTATIONMODEL(...), append ch_versions <<
GATK4_LEARNREADORIENTATIONMODEL.out.versions_gatk4, and do the same for
GATK4_GETPILEUPSUMMARIES.out.versions_gatk4 and
GATK4_CALCULATECONTAMINATION.out.versions_gatk4) so the versions aggregator
collects them.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c66e0a79-57e1-460b-b099-58641d4811bb

📥 Commits

Reviewing files that changed from the base of the PR and between 1b8f494 and 879ea26.

⛔ Files ignored due to path filters (3)
  • modules/nf-core/gatk4/calculatecontamination/tests/main.nf.test.snap is excluded by !**/*.snap
  • modules/nf-core/gatk4/getpileupsummaries/tests/main.nf.test.snap is excluded by !**/*.snap
  • modules/nf-core/gatk4/learnreadorientationmodel/tests/main.nf.test.snap is excluded by !**/*.snap
📒 Files selected for processing (17)
  • conf/modules.config
  • modules.json
  • modules/nf-core/gatk4/calculatecontamination/environment.yml
  • modules/nf-core/gatk4/calculatecontamination/main.nf
  • modules/nf-core/gatk4/calculatecontamination/meta.yml
  • modules/nf-core/gatk4/calculatecontamination/tests/main.nf.test
  • modules/nf-core/gatk4/calculatecontamination/tests/nextflow.config
  • modules/nf-core/gatk4/getpileupsummaries/environment.yml
  • modules/nf-core/gatk4/getpileupsummaries/main.nf
  • modules/nf-core/gatk4/getpileupsummaries/meta.yml
  • modules/nf-core/gatk4/getpileupsummaries/tests/main.nf.test
  • modules/nf-core/gatk4/learnreadorientationmodel/environment.yml
  • modules/nf-core/gatk4/learnreadorientationmodel/main.nf
  • modules/nf-core/gatk4/learnreadorientationmodel/meta.yml
  • modules/nf-core/gatk4/learnreadorientationmodel/tests/main.nf.test
  • modules/nf-core/gatk4/learnreadorientationmodel/tests/nextflow.config
  • workflows/twistcgp.nf

Comment thread conf/modules.config
Comment on lines +125 to +132
withName: GATK4_GETPILEUPSUMMARIES {
ext.prefix = { "${meta.id}.mutect2.artifactprior" }
publishDir = [
path: { "${params.outdir}/${meta.id}" },
mode: params.publish_dir_mode,
pattern: "*{gz}",
]
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

PublishDir pattern won't match output files.

GATK4_GETPILEUPSUMMARIES outputs *.pileups.table (per context snippet 3), but pattern *{gz} won't match. Nothing will be published.

Also, the prefix mutect2.artifactprior seems misnamed — this module generates pileup summaries, not artifact priors.

Proposed fix
     withName: GATK4_GETPILEUPSUMMARIES {
-        ext.prefix = { "${meta.id}.mutect2.artifactprior" }
+        ext.prefix = { "${meta.id}.mutect2" }
         publishDir = [
             path: { "${params.outdir}/${meta.id}" },
             mode: params.publish_dir_mode,
-            pattern: "*{gz}",
+            pattern: "*{table}",
         ]
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@conf/modules.config` around lines 125 - 132, The publishDir for withName:
GATK4_GETPILEUPSUMMARIES is using a wrong pattern and a misleading ext.prefix;
change ext.prefix from "mutect2.artifactprior" to something like
"mutect2.pileups" or "mutect2.pileup_summaries" and update publishDir.pattern to
match the actual outputs (e.g. use "*.pileups.table*" or a stricter glob like
"*.{pileups.table,pileups.table.gz}") so the generated *.pileups.table (and
optional gzipped variants) are picked up and published.

Comment on lines +43 to +48
stub:
prefix = task.ext.prefix ?: "${meta.id}"
"""
touch ${prefix}.contamination.table
touch ${prefix}.segmentation.table
"""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n modules/nf-core/gatk4/calculatecontamination/main.nf

Repository: fulcrumgenomics/twistcgp

Length of output: 2163


Align stub output with optional segmentation behavior.

The real process declares segmentation as optional: true (line 15) because GATK only creates it with --tumor-segmentation. The stub unconditionally touches both files, which doesn't match and can mask missing-output bugs in stub tests.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/nf-core/gatk4/calculatecontamination/main.nf` around lines 43 - 48,
The stub currently always creates both "${prefix}.contamination.table" and
"${prefix}.segmentation.table" which mismatches the process' optional
segmentation output; update the stub block so it always touches
"${prefix}.contamination.table" but only touches "${prefix}.segmentation.table"
when the tumor segmentation option was requested (e.g. check an extension flag
such as task.ext.tumor_segmentation or task.ext.tumorSegmentation that
corresponds to the --tumor-segmentation flag), leaving the rest of the stub and
the prefix assignment unchanged.

Comment on lines +41 to +63
contamination:
- - meta:
type: file
description: File containing the contamination table.
pattern: "*.contamination.table"
ontologies: []
- "*.contamination.table":
type: file
description: File containing the contamination table.
pattern: "*.contamination.table"
ontologies: []
segmentation:
- - meta:
type: file
description: File containing the contamination table.
pattern: "*.contamination.table"
ontologies: []
- "*.segmentation.table":
type: file
description: output table containing segmentation of tumor minor allele fractions
(optional)
pattern: "*.segmentation.table"
ontologies: []
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

❓ Verification inconclusive

Script executed:

# First, find the file and read the specified lines
fd -type f -name "meta.yml" | grep gatk4/calculatecontamination

Repository: fulcrumgenomics/twistcgp


Repository: fulcrumgenomics/twistcgp
Exit code: 1

stderr:

error: 'ype' isn't a valid value for '--type <filetype>'
  [possible values: file, directory, symlink, executable, empty, socket, pipe]

For more information try '--help'

Script executed:

# Read the specific file content at lines 41-63
cat -n modules/nf-core/gatk4/calculatecontamination/meta.yml | sed -n '41,63p'

Repository: fulcrumgenomics/twistcgp


Repository: fulcrumgenomics/twistcgp
Exit code: 0

stdout:

    41	  contamination:
    42	    - - meta:
    43	          type: file
    44	          description: File containing the contamination table.
    45	          pattern: "*.contamination.table"
    46	          ontologies: []
    47	      - "*.contamination.table":
    48	          type: file
    49	          description: File containing the contamination table.
    50	          pattern: "*.contamination.table"
    51	          ontologies: []
    52	  segmentation:
    53	    - - meta:
    54	          type: file
    55	          description: File containing the contamination table.
    56	          pattern: "*.contamination.table"
    57	          ontologies: []
    58	      - "*.segmentation.table":
    59	          type: file
    60	          description: output table containing segmentation of tumor minor allele fractions
    61	            (optional)
    62	          pattern: "*.segmentation.table"
    63	          ontologies: []

nf-core meta.yml output metadata structure conventions tuple format


The nf-core meta.yml file in modules documents module metadata including tool name, description, keywords, input/output channels, and authors. It follows a JSON schema defined in nf-core/modules repository at modules/meta-schema.json (linted by nf-core/tools). Updated structure (as of 2025) groups inputs/outputs by channel, distinguishing tuples (lists) from single elements. Tuple format: Lists represent tuple channels matching Nextflow input/output like tuple val(meta), path(files). Single channels are direct mappings. Example from bwa/mem (2025 structure): input: - # Tuple channel 1 - meta: type: map description: Groovy Map containing sample information - reads: type: file description: List of input FastQ files... - # Tuple channel 2 - meta2: type: map - index: type: file description: BWA genome index files pattern: ".{amb,ann,bwt,pac,sa}" - sort_bam: # Single val channel type: boolean description: use samtools sort (true) or samtools view (false) pattern: "true or false" output: - bam: # Tuple output - - meta: type: file # Note: likely typo, should be map - ".bam": type: file - versions: # Single - versions.yml: type: file description: File containing software versions pattern: "versions.yml" Channels use type (map/file/boolean etc.), description, pattern, optional ontologies. Linting expects consistency with main.nf (e.g., tuple val(meta) path(bar) -> list with meta map + bar file). Use nf-core modules lint --fix for updates. For pipelines, modules.json tracks versions.

Citations:


Fix output metadata structure.

Both meta entries are incorrectly marked as type: file (should be type: map—meta is a Groovy Map). Additionally, the segmentation output's meta block still references the contamination table description and pattern; it should describe the segmentation metadata instead.

Example of issues at lines 42-46 and 53-57
contamination:
  - - meta:
        type: file  # Should be type: map
        description: File containing the contamination table.
        pattern: "*.contamination.table"
        ontologies: []

segmentation:
  - - meta:
        type: file  # Should be type: map
        description: File containing the contamination table.  # Wrong: still says contamination
        pattern: "*.contamination.table"  # Wrong: still says contamination pattern
        ontologies: []
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/nf-core/gatk4/calculatecontamination/meta.yml` around lines 41 - 63,
Update the outputs' meta entries so they are Groovy maps instead of files and
fix the segmentation meta text/pattern: change the meta block under both
contamination and segmentation from "type: file" to "type: map" and adjust the
segmentation meta description and pattern to reference the segmentation output
(e.g., describe "segmentation of tumor minor allele fractions" and use the
"*.segmentation.table" pattern) so the "meta" map for contamination refers to
contamination metadata and the "meta" map for segmentation refers to
segmentation metadata.

keywords:
- gatk4
- germlinevariantsites
- getpileupsumaries
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Typo in keyword.

getpileupsumariesgetpileupsummaries

Proposed fix
-  - getpileupsumaries
+  - getpileupsummaries
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- getpileupsumaries
- getpileupsummaries
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/nf-core/gatk4/getpileupsummaries/meta.yml` at line 7, The workflow
metadata contains a typo: replace the incorrect keyword string
"getpileupsumaries" with the correct "getpileupsummaries" in meta.yml so the
tool name matches expected naming; update any occurrences of getpileupsumaries
(e.g., the entry currently present in meta.yml) to getpileupsummaries to ensure
consistency.

with a primary focus on variant discovery and genotyping. Its powerful processing engine
and high-performance computing features make it capable of taking on projects of any size.
homepage: https://gatk.broadinstitute.org/hc/en-us
documentation: https://gatk.broadinstitute.org/hc/en-us/categories/360002369672s
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potentially malformed documentation URL.

URL ends with 360002369672s — trailing 's' looks like a typo.

#!/bin/bash
# Verify if the documentation URL is valid
curl -sL -o /dev/null -w "%{http_code}" "https://gatk.broadinstitute.org/hc/en-us/categories/360002369672s"
echo ""
curl -sL -o /dev/null -w "%{http_code}" "https://gatk.broadinstitute.org/hc/en-us/categories/360002369672"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/nf-core/gatk4/getpileupsummaries/meta.yml` at line 16, The
documentation URL value in meta.yml appears malformed (the documentation:
"https://gatk.broadinstitute.org/hc/en-us/categories/360002369672s"); update the
documentation field in modules/nf-core/gatk4/getpileupsummaries/meta.yml to the
correct URL (likely remove the trailing 's' to
"https://gatk.broadinstitute.org/hc/en-us/categories/360002369672" or replace
with the verified, intended documentation link), and run a quick curl check to
confirm the corrected URL returns a 200 before committing.

script:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
def input_list = f1r2.collect { f1r2_ -> "--input ${f1r2_}" }.join(' ')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Locate producer/consumer contracts for f1r2 payload =="
rg -nP 'f1r2|LEARNREADORIENTATIONMODEL|GATK4_MUTECT2' --type-add 'nf:*.nf' -t nf -C3

Repository: fulcrumgenomics/twistcgp

Length of output: 6053


🏁 Script executed:

# Check if there are tests for this module
fd -type f -name '*test*' -o -name '*spec*' | grep -i learnreadorientationmodel

# Look for actual workflow test execution
find . -type f -name '*.nf' -path '*/test*' | head -20

# Check if there's nextflow config showing how f1r2 is passed
fd -type f -name 'nextflow.config' -o -name 'conf' | xargs grep -l f1r2 2>/dev/null || echo "No config found"

Repository: fulcrumgenomics/twistcgp

Length of output: 442


🏁 Script executed:

# Check the full LEARNREADORIENTATIONMODEL module to see entire script context
cat -n modules/nf-core/gatk4/learnreadorientationmodel/main.nf

Repository: fulcrumgenomics/twistcgp

Length of output: 1956


🏁 Script executed:

# Verify Groovy Path behavior - check if there's any wrapper or extension
rg -n 'metaClass|extensions|Path\.' modules/nf-core/gatk4/learnreadorientationmodel/ -A2 -B2

Repository: fulcrumgenomics/twistcgp

Length of output: 50


🏁 Script executed:

# Check for CI configuration
find . -type f \( -name '*.yml' -o -name '*.yaml' \) -path '*/.github/*' | head -10

Repository: fulcrumgenomics/twistcgp

Length of output: 174


🏁 Script executed:

# Look for test execution or sample invocation
find . -type f -name '*.nf' | xargs grep -l 'GATK4_LEARNREADORIENTATIONMODEL' | head -5

Repository: fulcrumgenomics/twistcgp

Length of output: 149


🏁 Script executed:

# Check if there's documentation about input handling in Nextflow for this pattern
cat -n modules/nf-core/gatk4/mutect2/main.nf | sed -n '1,30p'

Repository: fulcrumgenomics/twistcgp

Length of output: 1378


🏁 Script executed:

# Check the CI workflow to see if it tests the module
cat -n ./.github/workflows/twistgp_ci.yml | head -50

Repository: fulcrumgenomics/twistcgp

Length of output: 1971


🏁 Script executed:

# Check if there are any Nextflow docs or comments about path() binding behavior
rg -n 'path\(' modules/nf-core/gatk4/learnreadorientationmodel/ -B2 -A2

Repository: fulcrumgenomics/twistcgp

Length of output: 1376


🏁 Script executed:

# Search for similar patterns in the codebase - other modules using path() with .collect()
rg -n 'path\([^)]*\).*\.collect' --type nf -A1 -B1

Repository: fulcrumgenomics/twistcgp

Length of output: 94


🏁 Script executed:

# Look at the test file to understand what input is being passed
cat -n modules/nf-core/gatk4/learnreadorientationmodel/tests/main.nf.test

Repository: fulcrumgenomics/twistcgp

Length of output: 2012


🏁 Script executed:

# Check what the test data looks like
find modules/nf-core/gatk4/learnreadorientationmodel/tests -type f | head -20

Repository: fulcrumgenomics/twistcgp

Length of output: 276


🏁 Script executed:

# Look for Nextflow documentation or similar modules that handle optional multi-file inputs
rg -n 'collect\s*{' modules/nf-core/gatk4/ --type nf | head -20

Repository: fulcrumgenomics/twistcgp

Length of output: 153


🏁 Script executed:

# Look for .collect patterns in nf files
rg -n '\.collect' modules/nf-core/gatk4/ -A1 -B1

Repository: fulcrumgenomics/twistcgp

Length of output: 4504


🏁 Script executed:

# Check how the test is actually structured - look at lines 19-20 more carefully
# The test passes a LIST with a single file element wrapped in square brackets
python3 << 'EOF'
test_input = "[ [ id:'test' ], [file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/gatk/paired_mutect2_calls/test_test2_paired_mutect2_calls.f1r2.tar.gz', checkIfExists: true)] ]"
print("Test Input Structure:")
print("Outer tuple: [meta_map, f1r2_binding]")
print("f1r2_binding: [file(...)]  <- This is a LIST with one element")
print("\nSo f1r2 receives a LIST, not a single Path object")
print("Therefore .collect() on a List WILL work in Groovy")
EOF

Repository: fulcrumgenomics/twistcgp

Length of output: 291


🏁 Script executed:

# Check the actual workflow invocation and what MUTECT2 emits
cat -n workflows/twistcgp.nf | sed -n '130,140p'

Repository: fulcrumgenomics/twistcgp

Length of output: 694


🏁 Script executed:

# Verify MUTECT2 output definition
cat -n modules/nf-core/gatk4/mutect2/main.nf | sed -n '20,30p'

Repository: fulcrumgenomics/twistcgp

Length of output: 468


🏁 Script executed:

# Look at Nextflow semantics: does path() binding with a list of files vs single file differ?
# Search for other modules that handle this pattern
rg -n 'tuple val\(meta\), path\(' modules/nf-core/gatk4/ -A3 | grep -A3 'learnreadorientationmodel'

Repository: fulcrumgenomics/twistcgp

Length of output: 938


🏁 Script executed:

# Key question: in the test, why is the f1r2 wrapped in brackets [file(...)]?
# Let's check if this is the actual expected input structure or a test artifact
cat -n modules/nf-core/gatk4/learnreadorientationmodel/tests/main.nf.test | sed -n '14,25p'

Repository: fulcrumgenomics/twistcgp

Length of output: 550


🏁 Script executed:

# Understand Nextflow's path() binding behavior by checking Nextflow documentation patterns
# Look at how filtermutectcalls handles multiple file inputs
cat -n modules/nf-core/gatk4/filtermutectcalls/main.nf | sed -n '1,50p'

Repository: fulcrumgenomics/twistcgp

Length of output: 2571


🏁 Script executed:

# Check if there's a difference between how the test constructs input vs actual workflow
# The test wraps in brackets [file(...)] but MUTECT2 emits single file
# Verify what MUTECT2.out.f1r2 actually is at runtime
rg -n 'f1r2' workflows/twistcgp.nf -B2 -A2

Repository: fulcrumgenomics/twistcgp

Length of output: 339


🏁 Script executed:

# Critical check: Verify Nextflow path() binding behavior with single vs multiple files
# Look at the actual MUTECT2 output pattern and how it's bound
rg -n 'path\(".*\*' modules/nf-core/gatk4/mutect2/main.nf | grep -i f1r2

Repository: fulcrumgenomics/twistcgp

Length of output: 141


🏁 Script executed:

# Check if there are runtime tests that actually execute the workflow
cat -n ./.github/workflows/twistgp_ci.yml | sed -n '40,80p'

Repository: fulcrumgenomics/twistcgp

Length of output: 1719


🏁 Script executed:

# Verify the semantic difference: test wraps in brackets but workflow passes bare channel
# Look for similar patterns in filtermutectcalls test
cat -n modules/nf-core/gatk4/filtermutectcalls/tests/main.nf.test | head -60

Repository: fulcrumgenomics/twistcgp

Length of output: 2982


🏁 Script executed:

# Critical finding: compare how the test constructs input vs workflow reality
# In filtermutectcalls, optional files that could be lists are passed as [] or [file]
# Let's check the actual Nextflow semantics documentation pattern
# Look at how LEARNREADORIENTATIONMODEL is actually invoked vs how test constructs it

python3 << 'EOF'
print("=== CRITICAL MISMATCH ===\n")
print("MUTECT2 Output:")
print('  tuple val(meta), path("*.f1r2.tar.gz"), optional: true, emit: f1r2')
print("  -> Emits SINGLE file per sample\n")

print("LEARNREADORIENTATIONMODEL Input:")
print('  tuple val(meta), path(f1r2)')
print("  -> Expects SINGLE Path object\n")

print("Workflow invocation:")
print("  GATK4_LEARNREADORIENTATIONMODEL(GATK4_MUTECT2.out.f1r2)")
print("  -> Passes single file directly\n")

print("Test construction:")
print('  input[0] = [ [ id:"test" ], [file(...)] ]')
print("  -> MANUALLY WRAPS in brackets to create a List\n")

print("=== ISSUE ===")
print("In workflow: f1r2 is a Path object")
print("In test: f1r2 is a List with one Path object")
print("\nPath.collect() -> NoSuchMethodError (Path lacks collect method)")
print("List.collect() -> Works fine\n")
print("Conclusion: Code is BROKEN in workflow, only works in test due to wrapping")
EOF

Repository: fulcrumgenomics/twistcgp

Length of output: 800


Fix .collect() call on f1r2 — it receives a single Path object, not a List.

MUTECT2 emits a single file per sample (path("*.f1r2.tar.gz")), which binds as a Path object in this process. Calling .collect() directly on a Path will fail with NoSuchMethodError at runtime. The test passes only because it manually wraps the input in brackets, unlike the actual workflow.

Proposed fix
-    def input_list = f1r2.collect { f1r2_ -> "--input ${f1r2_}" }.join(' ')
+    def f1r2_files = (f1r2 instanceof List ? f1r2 : [f1r2])
+    def input_list = f1r2_files.collect { f1r2_ -> "--input ${f1r2_}" }.join(' ')
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def input_list = f1r2.collect { f1r2_ -> "--input ${f1r2_}" }.join(' ')
def f1r2_files = (f1r2 instanceof List ? f1r2 : [f1r2])
def input_list = f1r2_files.collect { f1r2_ -> "--input ${f1r2_}" }.join(' ')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/nf-core/gatk4/learnreadorientationmodel/main.nf` at line 23,
input_list is built by calling .collect() on f1r2 which is a single Path (not a
List) and will throw NoSuchMethodError at runtime; change the construction of
input_list in the learnreadorientationmodel process to handle both single Path
and list cases (e.g., detect if f1r2 is a List and join via .collect(...).join('
'), otherwise format a single "--input ${f1r2}" string) so it works when MUTECT2
binds a single file as a Path.

Comment on lines +31 to +43
artifactprior:
- - meta:
type: file
description: file containing artifact-priors to be used by filtermutectcalls
pattern: "*.tar.gz"
ontologies:
- edam: http://edamontology.org/format_3989 # GZIP format
- "*.tar.gz":
type: file
description: file containing artifact-priors to be used by filtermutectcalls
pattern: "*.tar.gz"
ontologies:
- edam: http://edamontology.org/format_3989 # GZIP format
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, locate the file and examine full context
find . -name "meta.yml" -path "*/gatk4/learnreadorientationmodel/*" -type f

Repository: fulcrumgenomics/twistcgp

Length of output: 126


🏁 Script executed:

# Search for other similar tuple output patterns in the codebase to understand conventions
rg -A 10 "^\s+- -\s*meta:" --type yaml | head -100

Repository: fulcrumgenomics/twistcgp

Length of output: 6704


🌐 Web query:

nf-core meta.yml tuple output conventions file metadata

💡 Result:

In nf-core modules, the meta.yml file documents the module's inputs and outputs, with specific conventions for tuple-based channels commonly used in Nextflow processes. Tuple channels in main.nf, like tuple val(meta), path(reads), are represented in meta.yml as YAML lists. Each list item describes a tuple element: - First item: meta (type: map) - Groovy map with sample info (e.g., id, single_end). - Subsequent items: file paths (type: file), with description, pattern (e.g., ".fastq"), optional EDAM ontology. Example from bwa/mem module (updated structure, 2025): input: - - meta: type: map description: Groovy Map containing sample information - reads: type: file description: List of input FastQ files... pattern: ".{fastq.gz}" Single-element channels (e.g., val sort_bam or path "versions.yml") are not lists. Outputs follow the same pattern, grouped by emit name (e.g., bam: [...tuple elements...]). This structure groups by channel, distinguishes tuples (lists) from singles, validates against nf-core/modules meta-schema.json, and aids discoverability on nf-core site. Use nf-core modules lint --fix to update old flat structures. Schema at https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json (referenced in lint tests).

Citations:


Fix meta entry in artifactprior output block.

The meta tuple slot must be type: map, not type: file. File attributes (pattern, ontologies) belong on the file entry only, not on meta. Use type: map with a generic description for the metadata element and move all file-specific attributes to the file entry below it, per nf-core conventions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/nf-core/gatk4/learnreadorientationmodel/meta.yml` around lines 31 -
43, The artifactprior output block is incorrect: change the nested meta tuple
entry from "type: file" to "type: map" and give it a generic metadata
description, and remove file-specific keys (pattern, ontologies) from that meta
map; ensure the file entry for "*.tar.gz" (the second tuple element) retains the
file-specific attributes (type: file, pattern: "*.tar.gz", description: ...,
ontologies: [...]) so all file attributes live only on the "*.tar.gz" file entry
while meta becomes a simple map describing the metadata.

Comment on lines +28 to +31
{ assert snapshot(
path(process.out.artifactprior[0][1]).linesGzip[3..7],
process.out.findAll { key, val -> key.startsWith("versions") }
).match() }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "main.nf.test" -path "*learnreadorientationmodel*"

Repository: fulcrumgenomics/twistcgp

Length of output: 136


🏁 Script executed:

cat -n ./modules/nf-core/gatk4/learnreadorientationmodel/tests/main.nf.test

Repository: fulcrumgenomics/twistcgp

Length of output: 2012


🏁 Script executed:

find . -name "main.nf" -path "*learnreadorientationmodel*" -type f

Repository: fulcrumgenomics/twistcgp

Length of output: 125


🏁 Script executed:

cat -n ./modules/nf-core/gatk4/learnreadorientationmodel/main.nf

Repository: fulcrumgenomics/twistcgp

Length of output: 1956


🏁 Script executed:

web_search
nf-test linesGzip tarball tar.gz

Repository: fulcrumgenomics/twistcgp

Length of output: 164


🌐 Web query:

nf-test linesGzip documentation tarball

💡 Result:

linesGzip is a property in nf-test (Nextflow testing framework) that extends the Path object to read lines from gzip-compressed files without decompressing them to disk. It returns an array of lines for assertions in tests. Key features from official nf-test documentation: - Read all lines: path(process.out.out_ch.get(0)).linesGzip - Check size: assert path(...).linesGzip.size == 5 - Check contains: assert path(...).linesGzip.contains("Line Content") - Slice/filter: def lines = path(...).linesGzip[0..5]; assert lines.size == 6 - Grep specific lines efficiently (for large files): path(...).grepLinesGzip(0,5) or path(...).grepLineGzip(0) - Snapshot support: assert snapshot(path(...).linesGzip[0]) Note: For gzip files, the md5 property computes checksum after gunzipping, unlike non-gz files. No specific "tarball" is mentioned; tests typically use individual .gz files from repositories like nf-core/test-datasets. Tarballs may appear in Nextflow workflows (e.g., for bundled data), but linesGzip handles single .gz files.

Citations:


Don't snapshot a tarball with linesGzip.

The process outputs a .tar.gz archive, but linesGzip is designed for gzip-compressed text files. Using it on a tarball reads binary tar stream bytes as "lines," which is meaningless and brittle. Snapshot the path itself instead, or assert on extracted contents if validation of archive structure is needed.

Suggested fix
-                { assert snapshot(
-					path(process.out.artifactprior[0][1]).linesGzip[3..7],
-                    process.out.findAll { key, val -> key.startsWith("versions") }
-                ).match() }
+                { assert snapshot(
+                    path(process.out.artifactprior[0][1]),
+                    process.out.findAll { key, val -> key.startsWith("versions") }
+                ).match() }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{ assert snapshot(
path(process.out.artifactprior[0][1]).linesGzip[3..7],
process.out.findAll { key, val -> key.startsWith("versions") }
).match() }
{ assert snapshot(
path(process.out.artifactprior[0][1]),
process.out.findAll { key, val -> key.startsWith("versions") }
).match() }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/nf-core/gatk4/learnreadorientationmodel/tests/main.nf.test` around
lines 28 - 31, The test is incorrectly using linesGzip on a .tar.gz tarball;
change the assertion to snapshot the archive path itself or validate extracted
contents instead: replace the snapshot call that references
path(process.out.artifactprior[0][1]).linesGzip[3..7] with either
snapshot(path(process.out.artifactprior[0][1])) to record the tarball, or
extract the tarball in the test and assert on the extracted files' contents
(e.g., list or specific file contents) when calling snapshot; update the test
that constructs the snapshot (the assertion block containing
process.out.artifactprior and process.out.findAll) accordingly.

Comment thread workflows/twistcgp.nf Outdated
Comment on lines +171 to +174
ch_mutect2_samples = GATK4_MUTECT2.out.vcf
.join(GATK4_MUTECT2.out.tbi)
.join(GATK4_MUTECT2.out.stats)
.map { meta, vcf, tbi, stats ->
.join(GATK4_LEARNREADORIENTATIONMODEL.out.artifactprior)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find and read the relevant file
find . -name "twistcgp.nf" -type f

Repository: fulcrumgenomics/twistcgp

Length of output: 91


🏁 Script executed:

# Read the file around lines 171-174
if [ -f "workflows/twistcgp.nf" ]; then
  sed -n '165,180p' "workflows/twistcgp.nf"
fi

Repository: fulcrumgenomics/twistcgp

Length of output: 713


🏁 Script executed:

# Search for GATK4_LEARNREADORIENTATIONMODEL output definitions
rg "GATK4_LEARNREADORIENTATIONMODEL" -A 5 -B 5 --max-count=20

Repository: fulcrumgenomics/twistcgp

Length of output: 8446


🏁 Script executed:

# Check if there are other join patterns in the file with remainder or optional handling
rg "\.join\(" "workflows/twistcgp.nf" -A 2 -B 2

Repository: fulcrumgenomics/twistcgp

Length of output: 1708


🏁 Script executed:

# Check the GATK4_LEARNREADORIENTATIONMODEL module definition
find . -path "*gatk4/learnreadorientationmodel/main.nf" -exec cat {} \;

Repository: fulcrumgenomics/twistcgp

Length of output: 1634


🏁 Script executed:

# Check if GATK4_MUTECT2 has conditional f1r2 output handling
find . -path "*gatk4/mutect2/main.nf" -exec head -100 {} \;

Repository: fulcrumgenomics/twistcgp

Length of output: 2695


🏁 Script executed:

# Search for any conditional logic around GATK4_LEARNREADORIENTATIONMODEL execution
rg "GATK4_LEARNREADORIENTATIONMODEL" -B 10 -A 5 "workflows/twistcgp.nf"

Repository: fulcrumgenomics/twistcgp

Length of output: 2605


🏁 Script executed:

# Check Nextflow join documentation and usage with remainder in the codebase
rg "remainder\s*:" -B 2 -A 2

Repository: fulcrumgenomics/twistcgp

Length of output: 50


🏁 Script executed:

# Check if there are any examples of handling optional channels with join in the workflow
rg "join.*optional" -B 3 -A 3

Repository: fulcrumgenomics/twistcgp

Length of output: 50


Fix inner join on optional artifact priors.

Mutect2 only produces f1r2 (marked optional: true) when enabled, so GATK4_LEARNREADORIENTATIONMODEL won't run for all samples. The default inner join drops unmatched items, causing samples without artifactprior to disappear before GATK4_FILTERMUTECTCALLS instead of falling back to empty orientation-bias input.

Suggested fix
     ch_mutect2_samples = GATK4_MUTECT2.out.vcf
         .join(GATK4_MUTECT2.out.tbi)
         .join(GATK4_MUTECT2.out.stats)
-        .join(GATK4_LEARNREADORIENTATIONMODEL.out.artifactprior)
+        .join(GATK4_LEARNREADORIENTATIONMODEL.out.artifactprior, remainder: true)
+        .map { meta, vcf, tbi, stats, artifactprior -> tuple(meta, vcf, tbi, stats, artifactprior ?: []) }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ch_mutect2_samples = GATK4_MUTECT2.out.vcf
.join(GATK4_MUTECT2.out.tbi)
.join(GATK4_MUTECT2.out.stats)
.map { meta, vcf, tbi, stats ->
.join(GATK4_LEARNREADORIENTATIONMODEL.out.artifactprior)
ch_mutect2_samples = GATK4_MUTECT2.out.vcf
.join(GATK4_MUTECT2.out.tbi)
.join(GATK4_MUTECT2.out.stats)
.join(GATK4_LEARNREADORIENTATIONMODEL.out.artifactprior, remainder: true)
.map { meta, vcf, tbi, stats, artifactprior -> tuple(meta, vcf, tbi, stats, artifactprior ?: []) }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@workflows/twistcgp.nf` around lines 171 - 174, The current chain uses an
inner join that drops samples missing artifact priors: replace the final
.join(GATK4_LEARNREADORIENTATIONMODEL.out.artifactprior) with a left-join
variant so ch_mutect2_samples retains all GATK4_MUTECT2 entries and unmatched
artifactprior values are passed as empty/null; specifically change the join
operation used to combine GATK4_MUTECT2.out.vcf/.tbi/.stats with
GATK4_LEARNREADORIENTATIONMODEL.out.artifactprior (e.g., use leftJoin or the
DSL's left outer join API) and ensure GATK4_FILTERMUTECTCALLS consumes a
fallback empty orientation-bias input when the artifactprior is missing.

Comment thread workflows/twistcgp.nf
Comment on lines +176 to +179
ch_filtermutect_in = params.population_germline_vcf
? ch_mutect2_samples
.join(GATK4_CALCULATECONTAMINATION.out.segmentation)
.join(GATK4_CALCULATECONTAMINATION.out.contamination)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's locate and examine the workflow file
cd workflows
ls -la | head -20
wc -l twistcgp.nf

Repository: fulcrumgenomics/twistcgp

Length of output: 262


🏁 Script executed:

# Read the specific lines in context (lines 170-185 to get surrounding context)
sed -n '170,185p' twistcgp.nf

Repository: fulcrumgenomics/twistcgp

Length of output: 122


🏁 Script executed:

# Search for GATK4_CALCULATECONTAMINATION definition to understand its outputs
rg "GATK4_CALCULATECONTAMINATION" twistcgp.nf -B 2 -A 5 | head -50

Repository: fulcrumgenomics/twistcgp

Length of output: 119


🏁 Script executed:

# Look for the output definitions to understand if segmentation is optional
rg "segmentation|contamination" twistcgp.nf -B 1 -A 1 | head -100

Repository: fulcrumgenomics/twistcgp

Length of output: 119


🏁 Script executed:

# Read the specific lines with full path from repo root
sed -n '170,185p' workflows/twistcgp.nf

Repository: fulcrumgenomics/twistcgp

Length of output: 842


🏁 Script executed:

# Search for GATK4_CALCULATECONTAMINATION and its outputs
rg "GATK4_CALCULATECONTAMINATION" workflows/ -B 2 -A 5

Repository: fulcrumgenomics/twistcgp

Length of output: 2159


🏁 Script executed:

# Look for the module definition or output structure
rg "out\.(segmentation|contamination)" workflows/ -B 2 -A 2

Repository: fulcrumgenomics/twistcgp

Length of output: 557


🏁 Script executed:

# Find the GATK4_CALCULATECONTAMINATION module definition
find . -name "*calculatecontamination*" -type f

Repository: fulcrumgenomics/twistcgp

Length of output: 50


🏁 Script executed:

# Check the modules directory structure
find modules -name "*calculatecontamination*" -o -name "*CALCULATECONTAMINATION*" | head -20

Repository: fulcrumgenomics/twistcgp

Length of output: 112


🏁 Script executed:

# Search for output definitions in the module
rg "segmentation" modules/ -B 2 -A 2

Repository: fulcrumgenomics/twistcgp

Length of output: 10333


🏁 Script executed:

# Verify the exact main.nf file for GATK4_CALCULATECONTAMINATION
cat modules/nf-core/gatk4/calculatecontamination/main.nf

Repository: fulcrumgenomics/twistcgp

Length of output: 1820


Use remainder: true and coalesce missing segmentation to prevent silent sample loss.

The segmentation output is optional (marked in the module), but the join operations here are inner joins by default. When segmentation isn't emitted, matched samples are silently dropped and never reach GATK4_FILTERMUTECTCALLS. Apply remainder: true to the first join and coalesce missing segmentation to [].

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@workflows/twistcgp.nf` around lines 176 - 179, The current inner joins drop
samples when GATK4_CALCULATECONTAMINATION.out.segmentation is not emitted;
change the first join to be a remainder join (use remainder: true on the join
between ch_mutect2_samples and GATK4_CALCULATECONTAMINATION.out.segmentation)
and ensure missing segmentation values are coalesced to an empty list before the
subsequent join to contamination/consumers (e.g., when unpacking the joined
tuple for ch_filtermutect_in replace a null/undefined segmentation with []), so
no samples are silently lost before GATK4_FILTERMUTECTCALLS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants