feat: LEARNREADORIENTATIONMODEL and CALCULATE_CONTAMINATION#88
feat: LEARNREADORIENTATIONMODEL and CALCULATE_CONTAMINATION#88emmcauley wants to merge 3 commits into
Conversation
|
98ba159 to
879ea26
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
📝 WalkthroughWalkthroughThis pull request introduces three new GATK4 modules to the pipeline: 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 13
🧹 Nitpick comments (2)
workflows/twistcgp.nf (1)
133-165: Wire the new GATK steps intoch_versions.These three process calls never feed their
versions_gatk4channels intoch_versions, sotwistcgp_software_mqc_versions.ymlwill omit them.Possible fix
+ ch_versions = ch_versions.mix(GATK4_LEARNREADORIENTATIONMODEL.out.versions_gatk4.first()) + ch_versions = ch_versions.mix(GATK4_GETPILEUPSUMMARIES.out.versions_gatk4.first()) + ch_versions = ch_versions.mix(GATK4_CALCULATECONTAMINATION.out.versions_gatk4.first())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@workflows/twistcgp.nf` around lines 133 - 165, The GATK steps GATK4_LEARNREADORIENTATIONMODEL, GATK4_GETPILEUPSUMMARIES, and GATK4_CALCULATECONTAMINATION are not contributing their versions_gatk4 outputs into ch_versions, so their versions are omitted from twistcgp_software_mqc_versions.yml; update the workflow to push or merge each process' versions_gatk4 channel into ch_versions (for example, after calling GATK4_LEARNREADORIENTATIONMODEL(...), append ch_versions << GATK4_LEARNREADORIENTATIONMODEL.out.versions_gatk4, and do the same for GATK4_GETPILEUPSUMMARIES.out.versions_gatk4 and GATK4_CALCULATECONTAMINATION.out.versions_gatk4) so the versions aggregator collects them.modules/nf-core/gatk4/learnreadorientationmodel/tests/main.nf.test (1)
19-20: Use the real upstreamf1r2shape in the test.The workflow passes
GATK4_MUTECT2.out.f1r2directly here, and apath()output withoutarityemits a single file when one match is produced. These cases wrap that file in a one-element list, so the test is exercising a different binding shape than production. (nextflow.io)Also applies to: 43-44
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modules/nf-core/gatk4/learnreadorientationmodel/tests/main.nf.test` around lines 19 - 20, The test is wrapping the f1r2 artifact in an extra one-element list, which differs from the real upstream binding shape (GATK4_MUTECT2.out.f1r2) that emits a single file; update the test so the second element is the file path itself instead of a list (i.e., change the [file(params.modules_testdata_base_path + ...)] entry to file(params.modules_testdata_base_path + ...) in the input[0] assignment), and make the same change at the other occurrence (lines noted as also applies to 43-44).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@conf/modules.config`:
- Around line 125-132: The publishDir for withName: GATK4_GETPILEUPSUMMARIES is
using a wrong pattern and a misleading ext.prefix; change ext.prefix from
"mutect2.artifactprior" to something like "mutect2.pileups" or
"mutect2.pileup_summaries" and update publishDir.pattern to match the actual
outputs (e.g. use "*.pileups.table*" or a stricter glob like
"*.{pileups.table,pileups.table.gz}") so the generated *.pileups.table (and
optional gzipped variants) are picked up and published.
In `@modules/nf-core/gatk4/calculatecontamination/main.nf`:
- Around line 43-48: The stub currently always creates both
"${prefix}.contamination.table" and "${prefix}.segmentation.table" which
mismatches the process' optional segmentation output; update the stub block so
it always touches "${prefix}.contamination.table" but only touches
"${prefix}.segmentation.table" when the tumor segmentation option was requested
(e.g. check an extension flag such as task.ext.tumor_segmentation or
task.ext.tumorSegmentation that corresponds to the --tumor-segmentation flag),
leaving the rest of the stub and the prefix assignment unchanged.
In `@modules/nf-core/gatk4/calculatecontamination/meta.yml`:
- Around line 41-63: Update the outputs' meta entries so they are Groovy maps
instead of files and fix the segmentation meta text/pattern: change the meta
block under both contamination and segmentation from "type: file" to "type: map"
and adjust the segmentation meta description and pattern to reference the
segmentation output (e.g., describe "segmentation of tumor minor allele
fractions" and use the "*.segmentation.table" pattern) so the "meta" map for
contamination refers to contamination metadata and the "meta" map for
segmentation refers to segmentation metadata.
In `@modules/nf-core/gatk4/getpileupsummaries/meta.yml`:
- Line 7: The workflow metadata contains a typo: replace the incorrect keyword
string "getpileupsumaries" with the correct "getpileupsummaries" in meta.yml so
the tool name matches expected naming; update any occurrences of
getpileupsumaries (e.g., the entry currently present in meta.yml) to
getpileupsummaries to ensure consistency.
- Around line 72-83: The inputs `variants` and `variants_tbi` in meta.yml are
defined as single-item entries but should use the nested list/grouped format
used elsewhere (the "- - " pattern) to match nf-core tooling expectations;
update both `variants` and `variants_tbi` to be arrays of arrays (convert their
current single mapping into a nested list entry), preserving their keys (`type`,
`description`, `pattern`, `ontologies`) and values so the structure matches
other grouped inputs and avoids parsing errors.
- Line 16: The documentation URL value in meta.yml appears malformed (the
documentation:
"https://gatk.broadinstitute.org/hc/en-us/categories/360002369672s"); update the
documentation field in modules/nf-core/gatk4/getpileupsummaries/meta.yml to the
correct URL (likely remove the trailing 's' to
"https://gatk.broadinstitute.org/hc/en-us/categories/360002369672" or replace
with the verified, intended documentation link), and run a quick curl check to
confirm the corrected URL returns a 200 before committing.
In `@modules/nf-core/gatk4/learnreadorientationmodel/main.nf`:
- Line 14: The tuple declaration tuple val(meta), path("*.tar.gz"), emit:
artifactprior is too broad and catches staged files like *.f1r2.tar.gz; narrow
the glob to only the intended artifact filenames (for example replace "*.tar.gz"
with a more specific pattern such as "*.prior.tar.gz" or the exact artifact
basename used downstream) so that only the correct prior artifact files are
emitted as artifactprior.
- Line 15: The pipeline uses the Nextflow "topic:" syntax in the tuple emission
(the tuple line with topic: versions / emit: versions_gatk4) which requires
Nextflow 25.04+ or the preview flag; update the project constraint
(nextflowVersion) to allow >=25.04.0, or alternatively enable the preview
feature by adding the config key nextflow.preview.topic = true in the pipeline
configuration so the tuple with topic: versions works on older Nextflow
releases; locate the tuple emission (topic: versions) in
learnreadorientationmodel/main.nf and adjust either the nextflowVersion
constraint or the config to satisfy the requirement.
- Line 23: input_list is built by calling .collect() on f1r2 which is a single
Path (not a List) and will throw NoSuchMethodError at runtime; change the
construction of input_list in the learnreadorientationmodel process to handle
both single Path and list cases (e.g., detect if f1r2 is a List and join via
.collect(...).join(' '), otherwise format a single "--input ${f1r2}" string) so
it works when MUTECT2 binds a single file as a Path.
In `@modules/nf-core/gatk4/learnreadorientationmodel/meta.yml`:
- Around line 31-43: The artifactprior output block is incorrect: change the
nested meta tuple entry from "type: file" to "type: map" and give it a generic
metadata description, and remove file-specific keys (pattern, ontologies) from
that meta map; ensure the file entry for "*.tar.gz" (the second tuple element)
retains the file-specific attributes (type: file, pattern: "*.tar.gz",
description: ..., ontologies: [...]) so all file attributes live only on the
"*.tar.gz" file entry while meta becomes a simple map describing the metadata.
In `@modules/nf-core/gatk4/learnreadorientationmodel/tests/main.nf.test`:
- Around line 28-31: The test is incorrectly using linesGzip on a .tar.gz
tarball; change the assertion to snapshot the archive path itself or validate
extracted contents instead: replace the snapshot call that references
path(process.out.artifactprior[0][1]).linesGzip[3..7] with either
snapshot(path(process.out.artifactprior[0][1])) to record the tarball, or
extract the tarball in the test and assert on the extracted files' contents
(e.g., list or specific file contents) when calling snapshot; update the test
that constructs the snapshot (the assertion block containing
process.out.artifactprior and process.out.findAll) accordingly.
In `@workflows/twistcgp.nf`:
- Around line 176-179: The current inner joins drop samples when
GATK4_CALCULATECONTAMINATION.out.segmentation is not emitted; change the first
join to be a remainder join (use remainder: true on the join between
ch_mutect2_samples and GATK4_CALCULATECONTAMINATION.out.segmentation) and ensure
missing segmentation values are coalesced to an empty list before the subsequent
join to contamination/consumers (e.g., when unpacking the joined tuple for
ch_filtermutect_in replace a null/undefined segmentation with []), so no samples
are silently lost before GATK4_FILTERMUTECTCALLS.
- Around line 171-174: The current chain uses an inner join that drops samples
missing artifact priors: replace the final
.join(GATK4_LEARNREADORIENTATIONMODEL.out.artifactprior) with a left-join
variant so ch_mutect2_samples retains all GATK4_MUTECT2 entries and unmatched
artifactprior values are passed as empty/null; specifically change the join
operation used to combine GATK4_MUTECT2.out.vcf/.tbi/.stats with
GATK4_LEARNREADORIENTATIONMODEL.out.artifactprior (e.g., use leftJoin or the
DSL's left outer join API) and ensure GATK4_FILTERMUTECTCALLS consumes a
fallback empty orientation-bias input when the artifactprior is missing.
---
Nitpick comments:
In `@modules/nf-core/gatk4/learnreadorientationmodel/tests/main.nf.test`:
- Around line 19-20: The test is wrapping the f1r2 artifact in an extra
one-element list, which differs from the real upstream binding shape
(GATK4_MUTECT2.out.f1r2) that emits a single file; update the test so the second
element is the file path itself instead of a list (i.e., change the
[file(params.modules_testdata_base_path + ...)] entry to
file(params.modules_testdata_base_path + ...) in the input[0] assignment), and
make the same change at the other occurrence (lines noted as also applies to
43-44).
In `@workflows/twistcgp.nf`:
- Around line 133-165: The GATK steps GATK4_LEARNREADORIENTATIONMODEL,
GATK4_GETPILEUPSUMMARIES, and GATK4_CALCULATECONTAMINATION are not contributing
their versions_gatk4 outputs into ch_versions, so their versions are omitted
from twistcgp_software_mqc_versions.yml; update the workflow to push or merge
each process' versions_gatk4 channel into ch_versions (for example, after
calling GATK4_LEARNREADORIENTATIONMODEL(...), append ch_versions <<
GATK4_LEARNREADORIENTATIONMODEL.out.versions_gatk4, and do the same for
GATK4_GETPILEUPSUMMARIES.out.versions_gatk4 and
GATK4_CALCULATECONTAMINATION.out.versions_gatk4) so the versions aggregator
collects them.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c66e0a79-57e1-460b-b099-58641d4811bb
⛔ Files ignored due to path filters (3)
modules/nf-core/gatk4/calculatecontamination/tests/main.nf.test.snapis excluded by!**/*.snapmodules/nf-core/gatk4/getpileupsummaries/tests/main.nf.test.snapis excluded by!**/*.snapmodules/nf-core/gatk4/learnreadorientationmodel/tests/main.nf.test.snapis excluded by!**/*.snap
📒 Files selected for processing (17)
conf/modules.configmodules.jsonmodules/nf-core/gatk4/calculatecontamination/environment.ymlmodules/nf-core/gatk4/calculatecontamination/main.nfmodules/nf-core/gatk4/calculatecontamination/meta.ymlmodules/nf-core/gatk4/calculatecontamination/tests/main.nf.testmodules/nf-core/gatk4/calculatecontamination/tests/nextflow.configmodules/nf-core/gatk4/getpileupsummaries/environment.ymlmodules/nf-core/gatk4/getpileupsummaries/main.nfmodules/nf-core/gatk4/getpileupsummaries/meta.ymlmodules/nf-core/gatk4/getpileupsummaries/tests/main.nf.testmodules/nf-core/gatk4/learnreadorientationmodel/environment.ymlmodules/nf-core/gatk4/learnreadorientationmodel/main.nfmodules/nf-core/gatk4/learnreadorientationmodel/meta.ymlmodules/nf-core/gatk4/learnreadorientationmodel/tests/main.nf.testmodules/nf-core/gatk4/learnreadorientationmodel/tests/nextflow.configworkflows/twistcgp.nf
| withName: GATK4_GETPILEUPSUMMARIES { | ||
| ext.prefix = { "${meta.id}.mutect2.artifactprior" } | ||
| publishDir = [ | ||
| path: { "${params.outdir}/${meta.id}" }, | ||
| mode: params.publish_dir_mode, | ||
| pattern: "*{gz}", | ||
| ] | ||
| } |
There was a problem hiding this comment.
PublishDir pattern won't match output files.
GATK4_GETPILEUPSUMMARIES outputs *.pileups.table (per context snippet 3), but pattern *{gz} won't match. Nothing will be published.
Also, the prefix mutect2.artifactprior seems misnamed — this module generates pileup summaries, not artifact priors.
Proposed fix
withName: GATK4_GETPILEUPSUMMARIES {
- ext.prefix = { "${meta.id}.mutect2.artifactprior" }
+ ext.prefix = { "${meta.id}.mutect2" }
publishDir = [
path: { "${params.outdir}/${meta.id}" },
mode: params.publish_dir_mode,
- pattern: "*{gz}",
+ pattern: "*{table}",
]
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@conf/modules.config` around lines 125 - 132, The publishDir for withName:
GATK4_GETPILEUPSUMMARIES is using a wrong pattern and a misleading ext.prefix;
change ext.prefix from "mutect2.artifactprior" to something like
"mutect2.pileups" or "mutect2.pileup_summaries" and update publishDir.pattern to
match the actual outputs (e.g. use "*.pileups.table*" or a stricter glob like
"*.{pileups.table,pileups.table.gz}") so the generated *.pileups.table (and
optional gzipped variants) are picked up and published.
| stub: | ||
| prefix = task.ext.prefix ?: "${meta.id}" | ||
| """ | ||
| touch ${prefix}.contamination.table | ||
| touch ${prefix}.segmentation.table | ||
| """ |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n modules/nf-core/gatk4/calculatecontamination/main.nfRepository: fulcrumgenomics/twistcgp
Length of output: 2163
Align stub output with optional segmentation behavior.
The real process declares segmentation as optional: true (line 15) because GATK only creates it with --tumor-segmentation. The stub unconditionally touches both files, which doesn't match and can mask missing-output bugs in stub tests.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/nf-core/gatk4/calculatecontamination/main.nf` around lines 43 - 48,
The stub currently always creates both "${prefix}.contamination.table" and
"${prefix}.segmentation.table" which mismatches the process' optional
segmentation output; update the stub block so it always touches
"${prefix}.contamination.table" but only touches "${prefix}.segmentation.table"
when the tumor segmentation option was requested (e.g. check an extension flag
such as task.ext.tumor_segmentation or task.ext.tumorSegmentation that
corresponds to the --tumor-segmentation flag), leaving the rest of the stub and
the prefix assignment unchanged.
| contamination: | ||
| - - meta: | ||
| type: file | ||
| description: File containing the contamination table. | ||
| pattern: "*.contamination.table" | ||
| ontologies: [] | ||
| - "*.contamination.table": | ||
| type: file | ||
| description: File containing the contamination table. | ||
| pattern: "*.contamination.table" | ||
| ontologies: [] | ||
| segmentation: | ||
| - - meta: | ||
| type: file | ||
| description: File containing the contamination table. | ||
| pattern: "*.contamination.table" | ||
| ontologies: [] | ||
| - "*.segmentation.table": | ||
| type: file | ||
| description: output table containing segmentation of tumor minor allele fractions | ||
| (optional) | ||
| pattern: "*.segmentation.table" | ||
| ontologies: [] |
There was a problem hiding this comment.
❓ Verification inconclusive
Script executed:
# First, find the file and read the specified lines
fd -type f -name "meta.yml" | grep gatk4/calculatecontaminationRepository: fulcrumgenomics/twistcgp
Repository: fulcrumgenomics/twistcgp
Exit code: 1
stderr:
error: 'ype' isn't a valid value for '--type <filetype>'
[possible values: file, directory, symlink, executable, empty, socket, pipe]
For more information try '--help'
Script executed:
# Read the specific file content at lines 41-63
cat -n modules/nf-core/gatk4/calculatecontamination/meta.yml | sed -n '41,63p'Repository: fulcrumgenomics/twistcgp
Repository: fulcrumgenomics/twistcgp
Exit code: 0
stdout:
41 contamination:
42 - - meta:
43 type: file
44 description: File containing the contamination table.
45 pattern: "*.contamination.table"
46 ontologies: []
47 - "*.contamination.table":
48 type: file
49 description: File containing the contamination table.
50 pattern: "*.contamination.table"
51 ontologies: []
52 segmentation:
53 - - meta:
54 type: file
55 description: File containing the contamination table.
56 pattern: "*.contamination.table"
57 ontologies: []
58 - "*.segmentation.table":
59 type: file
60 description: output table containing segmentation of tumor minor allele fractions
61 (optional)
62 pattern: "*.segmentation.table"
63 ontologies: []
nf-core meta.yml output metadata structure conventions tuple format
The nf-core meta.yml file in modules documents module metadata including tool name, description, keywords, input/output channels, and authors. It follows a JSON schema defined in nf-core/modules repository at modules/meta-schema.json (linted by nf-core/tools). Updated structure (as of 2025) groups inputs/outputs by channel, distinguishing tuples (lists) from single elements. Tuple format: Lists represent tuple channels matching Nextflow input/output like tuple val(meta), path(files). Single channels are direct mappings. Example from bwa/mem (2025 structure): input: - # Tuple channel 1 - meta: type: map description: Groovy Map containing sample information - reads: type: file description: List of input FastQ files... - # Tuple channel 2 - meta2: type: map - index: type: file description: BWA genome index files pattern: ".{amb,ann,bwt,pac,sa}" - sort_bam: # Single val channel type: boolean description: use samtools sort (true) or samtools view (false) pattern: "true or false" output: - bam: # Tuple output - - meta: type: file # Note: likely typo, should be map - ".bam": type: file - versions: # Single - versions.yml: type: file description: File containing software versions pattern: "versions.yml" Channels use type (map/file/boolean etc.), description, pattern, optional ontologies. Linting expects consistency with main.nf (e.g., tuple val(meta) path(bar) -> list with meta map + bar file). Use nf-core modules lint --fix for updates. For pipelines, modules.json tracks versions.
Citations:
- 1: https://nf-co.re/docs/nf-core-tools/api_reference/dev/module_lint_tests/meta_yml
- 2: https://nf-co.re/blog/2025/modules-meta-yml
- 3: https://github.com/nf-core/modules/
Fix output metadata structure.
Both meta entries are incorrectly marked as type: file (should be type: map—meta is a Groovy Map). Additionally, the segmentation output's meta block still references the contamination table description and pattern; it should describe the segmentation metadata instead.
Example of issues at lines 42-46 and 53-57
contamination:
- - meta:
type: file # Should be type: map
description: File containing the contamination table.
pattern: "*.contamination.table"
ontologies: []
segmentation:
- - meta:
type: file # Should be type: map
description: File containing the contamination table. # Wrong: still says contamination
pattern: "*.contamination.table" # Wrong: still says contamination pattern
ontologies: []🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/nf-core/gatk4/calculatecontamination/meta.yml` around lines 41 - 63,
Update the outputs' meta entries so they are Groovy maps instead of files and
fix the segmentation meta text/pattern: change the meta block under both
contamination and segmentation from "type: file" to "type: map" and adjust the
segmentation meta description and pattern to reference the segmentation output
(e.g., describe "segmentation of tumor minor allele fractions" and use the
"*.segmentation.table" pattern) so the "meta" map for contamination refers to
contamination metadata and the "meta" map for segmentation refers to
segmentation metadata.
| keywords: | ||
| - gatk4 | ||
| - germlinevariantsites | ||
| - getpileupsumaries |
There was a problem hiding this comment.
Typo in keyword.
getpileupsumaries → getpileupsummaries
Proposed fix
- - getpileupsumaries
+ - getpileupsummaries📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - getpileupsumaries | |
| - getpileupsummaries |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/nf-core/gatk4/getpileupsummaries/meta.yml` at line 7, The workflow
metadata contains a typo: replace the incorrect keyword string
"getpileupsumaries" with the correct "getpileupsummaries" in meta.yml so the
tool name matches expected naming; update any occurrences of getpileupsumaries
(e.g., the entry currently present in meta.yml) to getpileupsummaries to ensure
consistency.
| with a primary focus on variant discovery and genotyping. Its powerful processing engine | ||
| and high-performance computing features make it capable of taking on projects of any size. | ||
| homepage: https://gatk.broadinstitute.org/hc/en-us | ||
| documentation: https://gatk.broadinstitute.org/hc/en-us/categories/360002369672s |
There was a problem hiding this comment.
Potentially malformed documentation URL.
URL ends with 360002369672s — trailing 's' looks like a typo.
#!/bin/bash
# Verify if the documentation URL is valid
curl -sL -o /dev/null -w "%{http_code}" "https://gatk.broadinstitute.org/hc/en-us/categories/360002369672s"
echo ""
curl -sL -o /dev/null -w "%{http_code}" "https://gatk.broadinstitute.org/hc/en-us/categories/360002369672"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/nf-core/gatk4/getpileupsummaries/meta.yml` at line 16, The
documentation URL value in meta.yml appears malformed (the documentation:
"https://gatk.broadinstitute.org/hc/en-us/categories/360002369672s"); update the
documentation field in modules/nf-core/gatk4/getpileupsummaries/meta.yml to the
correct URL (likely remove the trailing 's' to
"https://gatk.broadinstitute.org/hc/en-us/categories/360002369672" or replace
with the verified, intended documentation link), and run a quick curl check to
confirm the corrected URL returns a 200 before committing.
| script: | ||
| def args = task.ext.args ?: '' | ||
| def prefix = task.ext.prefix ?: "${meta.id}" | ||
| def input_list = f1r2.collect { f1r2_ -> "--input ${f1r2_}" }.join(' ') |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Locate producer/consumer contracts for f1r2 payload =="
rg -nP 'f1r2|LEARNREADORIENTATIONMODEL|GATK4_MUTECT2' --type-add 'nf:*.nf' -t nf -C3Repository: fulcrumgenomics/twistcgp
Length of output: 6053
🏁 Script executed:
# Check if there are tests for this module
fd -type f -name '*test*' -o -name '*spec*' | grep -i learnreadorientationmodel
# Look for actual workflow test execution
find . -type f -name '*.nf' -path '*/test*' | head -20
# Check if there's nextflow config showing how f1r2 is passed
fd -type f -name 'nextflow.config' -o -name 'conf' | xargs grep -l f1r2 2>/dev/null || echo "No config found"Repository: fulcrumgenomics/twistcgp
Length of output: 442
🏁 Script executed:
# Check the full LEARNREADORIENTATIONMODEL module to see entire script context
cat -n modules/nf-core/gatk4/learnreadorientationmodel/main.nfRepository: fulcrumgenomics/twistcgp
Length of output: 1956
🏁 Script executed:
# Verify Groovy Path behavior - check if there's any wrapper or extension
rg -n 'metaClass|extensions|Path\.' modules/nf-core/gatk4/learnreadorientationmodel/ -A2 -B2Repository: fulcrumgenomics/twistcgp
Length of output: 50
🏁 Script executed:
# Check for CI configuration
find . -type f \( -name '*.yml' -o -name '*.yaml' \) -path '*/.github/*' | head -10Repository: fulcrumgenomics/twistcgp
Length of output: 174
🏁 Script executed:
# Look for test execution or sample invocation
find . -type f -name '*.nf' | xargs grep -l 'GATK4_LEARNREADORIENTATIONMODEL' | head -5Repository: fulcrumgenomics/twistcgp
Length of output: 149
🏁 Script executed:
# Check if there's documentation about input handling in Nextflow for this pattern
cat -n modules/nf-core/gatk4/mutect2/main.nf | sed -n '1,30p'Repository: fulcrumgenomics/twistcgp
Length of output: 1378
🏁 Script executed:
# Check the CI workflow to see if it tests the module
cat -n ./.github/workflows/twistgp_ci.yml | head -50Repository: fulcrumgenomics/twistcgp
Length of output: 1971
🏁 Script executed:
# Check if there are any Nextflow docs or comments about path() binding behavior
rg -n 'path\(' modules/nf-core/gatk4/learnreadorientationmodel/ -B2 -A2Repository: fulcrumgenomics/twistcgp
Length of output: 1376
🏁 Script executed:
# Search for similar patterns in the codebase - other modules using path() with .collect()
rg -n 'path\([^)]*\).*\.collect' --type nf -A1 -B1Repository: fulcrumgenomics/twistcgp
Length of output: 94
🏁 Script executed:
# Look at the test file to understand what input is being passed
cat -n modules/nf-core/gatk4/learnreadorientationmodel/tests/main.nf.testRepository: fulcrumgenomics/twistcgp
Length of output: 2012
🏁 Script executed:
# Check what the test data looks like
find modules/nf-core/gatk4/learnreadorientationmodel/tests -type f | head -20Repository: fulcrumgenomics/twistcgp
Length of output: 276
🏁 Script executed:
# Look for Nextflow documentation or similar modules that handle optional multi-file inputs
rg -n 'collect\s*{' modules/nf-core/gatk4/ --type nf | head -20Repository: fulcrumgenomics/twistcgp
Length of output: 153
🏁 Script executed:
# Look for .collect patterns in nf files
rg -n '\.collect' modules/nf-core/gatk4/ -A1 -B1Repository: fulcrumgenomics/twistcgp
Length of output: 4504
🏁 Script executed:
# Check how the test is actually structured - look at lines 19-20 more carefully
# The test passes a LIST with a single file element wrapped in square brackets
python3 << 'EOF'
test_input = "[ [ id:'test' ], [file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/gatk/paired_mutect2_calls/test_test2_paired_mutect2_calls.f1r2.tar.gz', checkIfExists: true)] ]"
print("Test Input Structure:")
print("Outer tuple: [meta_map, f1r2_binding]")
print("f1r2_binding: [file(...)] <- This is a LIST with one element")
print("\nSo f1r2 receives a LIST, not a single Path object")
print("Therefore .collect() on a List WILL work in Groovy")
EOFRepository: fulcrumgenomics/twistcgp
Length of output: 291
🏁 Script executed:
# Check the actual workflow invocation and what MUTECT2 emits
cat -n workflows/twistcgp.nf | sed -n '130,140p'Repository: fulcrumgenomics/twistcgp
Length of output: 694
🏁 Script executed:
# Verify MUTECT2 output definition
cat -n modules/nf-core/gatk4/mutect2/main.nf | sed -n '20,30p'Repository: fulcrumgenomics/twistcgp
Length of output: 468
🏁 Script executed:
# Look at Nextflow semantics: does path() binding with a list of files vs single file differ?
# Search for other modules that handle this pattern
rg -n 'tuple val\(meta\), path\(' modules/nf-core/gatk4/ -A3 | grep -A3 'learnreadorientationmodel'Repository: fulcrumgenomics/twistcgp
Length of output: 938
🏁 Script executed:
# Key question: in the test, why is the f1r2 wrapped in brackets [file(...)]?
# Let's check if this is the actual expected input structure or a test artifact
cat -n modules/nf-core/gatk4/learnreadorientationmodel/tests/main.nf.test | sed -n '14,25p'Repository: fulcrumgenomics/twistcgp
Length of output: 550
🏁 Script executed:
# Understand Nextflow's path() binding behavior by checking Nextflow documentation patterns
# Look at how filtermutectcalls handles multiple file inputs
cat -n modules/nf-core/gatk4/filtermutectcalls/main.nf | sed -n '1,50p'Repository: fulcrumgenomics/twistcgp
Length of output: 2571
🏁 Script executed:
# Check if there's a difference between how the test constructs input vs actual workflow
# The test wraps in brackets [file(...)] but MUTECT2 emits single file
# Verify what MUTECT2.out.f1r2 actually is at runtime
rg -n 'f1r2' workflows/twistcgp.nf -B2 -A2Repository: fulcrumgenomics/twistcgp
Length of output: 339
🏁 Script executed:
# Critical check: Verify Nextflow path() binding behavior with single vs multiple files
# Look at the actual MUTECT2 output pattern and how it's bound
rg -n 'path\(".*\*' modules/nf-core/gatk4/mutect2/main.nf | grep -i f1r2Repository: fulcrumgenomics/twistcgp
Length of output: 141
🏁 Script executed:
# Check if there are runtime tests that actually execute the workflow
cat -n ./.github/workflows/twistgp_ci.yml | sed -n '40,80p'Repository: fulcrumgenomics/twistcgp
Length of output: 1719
🏁 Script executed:
# Verify the semantic difference: test wraps in brackets but workflow passes bare channel
# Look for similar patterns in filtermutectcalls test
cat -n modules/nf-core/gatk4/filtermutectcalls/tests/main.nf.test | head -60Repository: fulcrumgenomics/twistcgp
Length of output: 2982
🏁 Script executed:
# Critical finding: compare how the test constructs input vs workflow reality
# In filtermutectcalls, optional files that could be lists are passed as [] or [file]
# Let's check the actual Nextflow semantics documentation pattern
# Look at how LEARNREADORIENTATIONMODEL is actually invoked vs how test constructs it
python3 << 'EOF'
print("=== CRITICAL MISMATCH ===\n")
print("MUTECT2 Output:")
print(' tuple val(meta), path("*.f1r2.tar.gz"), optional: true, emit: f1r2')
print(" -> Emits SINGLE file per sample\n")
print("LEARNREADORIENTATIONMODEL Input:")
print(' tuple val(meta), path(f1r2)')
print(" -> Expects SINGLE Path object\n")
print("Workflow invocation:")
print(" GATK4_LEARNREADORIENTATIONMODEL(GATK4_MUTECT2.out.f1r2)")
print(" -> Passes single file directly\n")
print("Test construction:")
print(' input[0] = [ [ id:"test" ], [file(...)] ]')
print(" -> MANUALLY WRAPS in brackets to create a List\n")
print("=== ISSUE ===")
print("In workflow: f1r2 is a Path object")
print("In test: f1r2 is a List with one Path object")
print("\nPath.collect() -> NoSuchMethodError (Path lacks collect method)")
print("List.collect() -> Works fine\n")
print("Conclusion: Code is BROKEN in workflow, only works in test due to wrapping")
EOFRepository: fulcrumgenomics/twistcgp
Length of output: 800
Fix .collect() call on f1r2 — it receives a single Path object, not a List.
MUTECT2 emits a single file per sample (path("*.f1r2.tar.gz")), which binds as a Path object in this process. Calling .collect() directly on a Path will fail with NoSuchMethodError at runtime. The test passes only because it manually wraps the input in brackets, unlike the actual workflow.
Proposed fix
- def input_list = f1r2.collect { f1r2_ -> "--input ${f1r2_}" }.join(' ')
+ def f1r2_files = (f1r2 instanceof List ? f1r2 : [f1r2])
+ def input_list = f1r2_files.collect { f1r2_ -> "--input ${f1r2_}" }.join(' ')📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def input_list = f1r2.collect { f1r2_ -> "--input ${f1r2_}" }.join(' ') | |
| def f1r2_files = (f1r2 instanceof List ? f1r2 : [f1r2]) | |
| def input_list = f1r2_files.collect { f1r2_ -> "--input ${f1r2_}" }.join(' ') |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/nf-core/gatk4/learnreadorientationmodel/main.nf` at line 23,
input_list is built by calling .collect() on f1r2 which is a single Path (not a
List) and will throw NoSuchMethodError at runtime; change the construction of
input_list in the learnreadorientationmodel process to handle both single Path
and list cases (e.g., detect if f1r2 is a List and join via .collect(...).join('
'), otherwise format a single "--input ${f1r2}" string) so it works when MUTECT2
binds a single file as a Path.
| artifactprior: | ||
| - - meta: | ||
| type: file | ||
| description: file containing artifact-priors to be used by filtermutectcalls | ||
| pattern: "*.tar.gz" | ||
| ontologies: | ||
| - edam: http://edamontology.org/format_3989 # GZIP format | ||
| - "*.tar.gz": | ||
| type: file | ||
| description: file containing artifact-priors to be used by filtermutectcalls | ||
| pattern: "*.tar.gz" | ||
| ontologies: | ||
| - edam: http://edamontology.org/format_3989 # GZIP format |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate the file and examine full context
find . -name "meta.yml" -path "*/gatk4/learnreadorientationmodel/*" -type fRepository: fulcrumgenomics/twistcgp
Length of output: 126
🏁 Script executed:
# Search for other similar tuple output patterns in the codebase to understand conventions
rg -A 10 "^\s+- -\s*meta:" --type yaml | head -100Repository: fulcrumgenomics/twistcgp
Length of output: 6704
🌐 Web query:
nf-core meta.yml tuple output conventions file metadata
💡 Result:
In nf-core modules, the meta.yml file documents the module's inputs and outputs, with specific conventions for tuple-based channels commonly used in Nextflow processes. Tuple channels in main.nf, like tuple val(meta), path(reads), are represented in meta.yml as YAML lists. Each list item describes a tuple element: - First item: meta (type: map) - Groovy map with sample info (e.g., id, single_end). - Subsequent items: file paths (type: file), with description, pattern (e.g., ".fastq"), optional EDAM ontology. Example from bwa/mem module (updated structure, 2025): input: - - meta: type: map description: Groovy Map containing sample information - reads: type: file description: List of input FastQ files... pattern: ".{fastq.gz}" Single-element channels (e.g., val sort_bam or path "versions.yml") are not lists. Outputs follow the same pattern, grouped by emit name (e.g., bam: [...tuple elements...]). This structure groups by channel, distinguishes tuples (lists) from singles, validates against nf-core/modules meta-schema.json, and aids discoverability on nf-core site. Use nf-core modules lint --fix to update old flat structures. Schema at https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json (referenced in lint tests).
Citations:
- 1: Fix
meta.ymlstructure for input and output tuples nf-core/modules#4983 - 2: https://nf-co.re/blog/2025/modules-meta-yml
- 3: https://nf-co.re/docs/nf-core-tools/api_reference/dev/module_lint_tests/meta_yml
- 4: https://nf-co.re/docs/tutorials/nf-core_training/writing-nf-core-modules/chapter-5-writing-your-module
- 5: http://nf-co.re/docs/tutorials/nf-core_training/writing-nf-core-modules/chapter-5-writing-your-module
- 6: https://github.com/nf-core/modules/
Fix meta entry in artifactprior output block.
The meta tuple slot must be type: map, not type: file. File attributes (pattern, ontologies) belong on the file entry only, not on meta. Use type: map with a generic description for the metadata element and move all file-specific attributes to the file entry below it, per nf-core conventions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/nf-core/gatk4/learnreadorientationmodel/meta.yml` around lines 31 -
43, The artifactprior output block is incorrect: change the nested meta tuple
entry from "type: file" to "type: map" and give it a generic metadata
description, and remove file-specific keys (pattern, ontologies) from that meta
map; ensure the file entry for "*.tar.gz" (the second tuple element) retains the
file-specific attributes (type: file, pattern: "*.tar.gz", description: ...,
ontologies: [...]) so all file attributes live only on the "*.tar.gz" file entry
while meta becomes a simple map describing the metadata.
| { assert snapshot( | ||
| path(process.out.artifactprior[0][1]).linesGzip[3..7], | ||
| process.out.findAll { key, val -> key.startsWith("versions") } | ||
| ).match() } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "main.nf.test" -path "*learnreadorientationmodel*"Repository: fulcrumgenomics/twistcgp
Length of output: 136
🏁 Script executed:
cat -n ./modules/nf-core/gatk4/learnreadorientationmodel/tests/main.nf.testRepository: fulcrumgenomics/twistcgp
Length of output: 2012
🏁 Script executed:
find . -name "main.nf" -path "*learnreadorientationmodel*" -type fRepository: fulcrumgenomics/twistcgp
Length of output: 125
🏁 Script executed:
cat -n ./modules/nf-core/gatk4/learnreadorientationmodel/main.nfRepository: fulcrumgenomics/twistcgp
Length of output: 1956
🏁 Script executed:
web_search
nf-test linesGzip tarball tar.gzRepository: fulcrumgenomics/twistcgp
Length of output: 164
🌐 Web query:
nf-test linesGzip documentation tarball
💡 Result:
linesGzip is a property in nf-test (Nextflow testing framework) that extends the Path object to read lines from gzip-compressed files without decompressing them to disk. It returns an array of lines for assertions in tests. Key features from official nf-test documentation: - Read all lines: path(process.out.out_ch.get(0)).linesGzip - Check size: assert path(...).linesGzip.size == 5 - Check contains: assert path(...).linesGzip.contains("Line Content") - Slice/filter: def lines = path(...).linesGzip[0..5]; assert lines.size == 6 - Grep specific lines efficiently (for large files): path(...).grepLinesGzip(0,5) or path(...).grepLineGzip(0) - Snapshot support: assert snapshot(path(...).linesGzip[0]) Note: For gzip files, the md5 property computes checksum after gunzipping, unlike non-gz files. No specific "tarball" is mentioned; tests typically use individual .gz files from repositories like nf-core/test-datasets. Tarballs may appear in Nextflow workflows (e.g., for bundled data), but linesGzip handles single .gz files.
Citations:
- 1: https://www.nf-test.com/docs/assertions/files/
- 2: https://nf-co.re/docs/contributing/nf-test/assertions
Don't snapshot a tarball with linesGzip.
The process outputs a .tar.gz archive, but linesGzip is designed for gzip-compressed text files. Using it on a tarball reads binary tar stream bytes as "lines," which is meaningless and brittle. Snapshot the path itself instead, or assert on extracted contents if validation of archive structure is needed.
Suggested fix
- { assert snapshot(
- path(process.out.artifactprior[0][1]).linesGzip[3..7],
- process.out.findAll { key, val -> key.startsWith("versions") }
- ).match() }
+ { assert snapshot(
+ path(process.out.artifactprior[0][1]),
+ process.out.findAll { key, val -> key.startsWith("versions") }
+ ).match() }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| { assert snapshot( | |
| path(process.out.artifactprior[0][1]).linesGzip[3..7], | |
| process.out.findAll { key, val -> key.startsWith("versions") } | |
| ).match() } | |
| { assert snapshot( | |
| path(process.out.artifactprior[0][1]), | |
| process.out.findAll { key, val -> key.startsWith("versions") } | |
| ).match() } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/nf-core/gatk4/learnreadorientationmodel/tests/main.nf.test` around
lines 28 - 31, The test is incorrectly using linesGzip on a .tar.gz tarball;
change the assertion to snapshot the archive path itself or validate extracted
contents instead: replace the snapshot call that references
path(process.out.artifactprior[0][1]).linesGzip[3..7] with either
snapshot(path(process.out.artifactprior[0][1])) to record the tarball, or
extract the tarball in the test and assert on the extracted files' contents
(e.g., list or specific file contents) when calling snapshot; update the test
that constructs the snapshot (the assertion block containing
process.out.artifactprior and process.out.findAll) accordingly.
| ch_mutect2_samples = GATK4_MUTECT2.out.vcf | ||
| .join(GATK4_MUTECT2.out.tbi) | ||
| .join(GATK4_MUTECT2.out.stats) | ||
| .map { meta, vcf, tbi, stats -> | ||
| .join(GATK4_LEARNREADORIENTATIONMODEL.out.artifactprior) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, find and read the relevant file
find . -name "twistcgp.nf" -type fRepository: fulcrumgenomics/twistcgp
Length of output: 91
🏁 Script executed:
# Read the file around lines 171-174
if [ -f "workflows/twistcgp.nf" ]; then
sed -n '165,180p' "workflows/twistcgp.nf"
fiRepository: fulcrumgenomics/twistcgp
Length of output: 713
🏁 Script executed:
# Search for GATK4_LEARNREADORIENTATIONMODEL output definitions
rg "GATK4_LEARNREADORIENTATIONMODEL" -A 5 -B 5 --max-count=20Repository: fulcrumgenomics/twistcgp
Length of output: 8446
🏁 Script executed:
# Check if there are other join patterns in the file with remainder or optional handling
rg "\.join\(" "workflows/twistcgp.nf" -A 2 -B 2Repository: fulcrumgenomics/twistcgp
Length of output: 1708
🏁 Script executed:
# Check the GATK4_LEARNREADORIENTATIONMODEL module definition
find . -path "*gatk4/learnreadorientationmodel/main.nf" -exec cat {} \;Repository: fulcrumgenomics/twistcgp
Length of output: 1634
🏁 Script executed:
# Check if GATK4_MUTECT2 has conditional f1r2 output handling
find . -path "*gatk4/mutect2/main.nf" -exec head -100 {} \;Repository: fulcrumgenomics/twistcgp
Length of output: 2695
🏁 Script executed:
# Search for any conditional logic around GATK4_LEARNREADORIENTATIONMODEL execution
rg "GATK4_LEARNREADORIENTATIONMODEL" -B 10 -A 5 "workflows/twistcgp.nf"Repository: fulcrumgenomics/twistcgp
Length of output: 2605
🏁 Script executed:
# Check Nextflow join documentation and usage with remainder in the codebase
rg "remainder\s*:" -B 2 -A 2Repository: fulcrumgenomics/twistcgp
Length of output: 50
🏁 Script executed:
# Check if there are any examples of handling optional channels with join in the workflow
rg "join.*optional" -B 3 -A 3Repository: fulcrumgenomics/twistcgp
Length of output: 50
Fix inner join on optional artifact priors.
Mutect2 only produces f1r2 (marked optional: true) when enabled, so GATK4_LEARNREADORIENTATIONMODEL won't run for all samples. The default inner join drops unmatched items, causing samples without artifactprior to disappear before GATK4_FILTERMUTECTCALLS instead of falling back to empty orientation-bias input.
Suggested fix
ch_mutect2_samples = GATK4_MUTECT2.out.vcf
.join(GATK4_MUTECT2.out.tbi)
.join(GATK4_MUTECT2.out.stats)
- .join(GATK4_LEARNREADORIENTATIONMODEL.out.artifactprior)
+ .join(GATK4_LEARNREADORIENTATIONMODEL.out.artifactprior, remainder: true)
+ .map { meta, vcf, tbi, stats, artifactprior -> tuple(meta, vcf, tbi, stats, artifactprior ?: []) }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ch_mutect2_samples = GATK4_MUTECT2.out.vcf | |
| .join(GATK4_MUTECT2.out.tbi) | |
| .join(GATK4_MUTECT2.out.stats) | |
| .map { meta, vcf, tbi, stats -> | |
| .join(GATK4_LEARNREADORIENTATIONMODEL.out.artifactprior) | |
| ch_mutect2_samples = GATK4_MUTECT2.out.vcf | |
| .join(GATK4_MUTECT2.out.tbi) | |
| .join(GATK4_MUTECT2.out.stats) | |
| .join(GATK4_LEARNREADORIENTATIONMODEL.out.artifactprior, remainder: true) | |
| .map { meta, vcf, tbi, stats, artifactprior -> tuple(meta, vcf, tbi, stats, artifactprior ?: []) } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@workflows/twistcgp.nf` around lines 171 - 174, The current chain uses an
inner join that drops samples missing artifact priors: replace the final
.join(GATK4_LEARNREADORIENTATIONMODEL.out.artifactprior) with a left-join
variant so ch_mutect2_samples retains all GATK4_MUTECT2 entries and unmatched
artifactprior values are passed as empty/null; specifically change the join
operation used to combine GATK4_MUTECT2.out.vcf/.tbi/.stats with
GATK4_LEARNREADORIENTATIONMODEL.out.artifactprior (e.g., use leftJoin or the
DSL's left outer join API) and ensure GATK4_FILTERMUTECTCALLS consumes a
fallback empty orientation-bias input when the artifactprior is missing.
| ch_filtermutect_in = params.population_germline_vcf | ||
| ? ch_mutect2_samples | ||
| .join(GATK4_CALCULATECONTAMINATION.out.segmentation) | ||
| .join(GATK4_CALCULATECONTAMINATION.out.contamination) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's locate and examine the workflow file
cd workflows
ls -la | head -20
wc -l twistcgp.nfRepository: fulcrumgenomics/twistcgp
Length of output: 262
🏁 Script executed:
# Read the specific lines in context (lines 170-185 to get surrounding context)
sed -n '170,185p' twistcgp.nfRepository: fulcrumgenomics/twistcgp
Length of output: 122
🏁 Script executed:
# Search for GATK4_CALCULATECONTAMINATION definition to understand its outputs
rg "GATK4_CALCULATECONTAMINATION" twistcgp.nf -B 2 -A 5 | head -50Repository: fulcrumgenomics/twistcgp
Length of output: 119
🏁 Script executed:
# Look for the output definitions to understand if segmentation is optional
rg "segmentation|contamination" twistcgp.nf -B 1 -A 1 | head -100Repository: fulcrumgenomics/twistcgp
Length of output: 119
🏁 Script executed:
# Read the specific lines with full path from repo root
sed -n '170,185p' workflows/twistcgp.nfRepository: fulcrumgenomics/twistcgp
Length of output: 842
🏁 Script executed:
# Search for GATK4_CALCULATECONTAMINATION and its outputs
rg "GATK4_CALCULATECONTAMINATION" workflows/ -B 2 -A 5Repository: fulcrumgenomics/twistcgp
Length of output: 2159
🏁 Script executed:
# Look for the module definition or output structure
rg "out\.(segmentation|contamination)" workflows/ -B 2 -A 2Repository: fulcrumgenomics/twistcgp
Length of output: 557
🏁 Script executed:
# Find the GATK4_CALCULATECONTAMINATION module definition
find . -name "*calculatecontamination*" -type fRepository: fulcrumgenomics/twistcgp
Length of output: 50
🏁 Script executed:
# Check the modules directory structure
find modules -name "*calculatecontamination*" -o -name "*CALCULATECONTAMINATION*" | head -20Repository: fulcrumgenomics/twistcgp
Length of output: 112
🏁 Script executed:
# Search for output definitions in the module
rg "segmentation" modules/ -B 2 -A 2Repository: fulcrumgenomics/twistcgp
Length of output: 10333
🏁 Script executed:
# Verify the exact main.nf file for GATK4_CALCULATECONTAMINATION
cat modules/nf-core/gatk4/calculatecontamination/main.nfRepository: fulcrumgenomics/twistcgp
Length of output: 1820
Use remainder: true and coalesce missing segmentation to prevent silent sample loss.
The segmentation output is optional (marked in the module), but the join operations here are inner joins by default. When segmentation isn't emitted, matched samples are silently dropped and never reach GATK4_FILTERMUTECTCALLS. Apply remainder: true to the first join and coalesce missing segmentation to [].
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@workflows/twistcgp.nf` around lines 176 - 179, The current inner joins drop
samples when GATK4_CALCULATECONTAMINATION.out.segmentation is not emitted;
change the first join to be a remainder join (use remainder: true on the join
between ch_mutect2_samples and GATK4_CALCULATECONTAMINATION.out.segmentation)
and ensure missing segmentation values are coalesced to an empty list before the
subsequent join to contamination/consumers (e.g., when unpacking the joined
tuple for ch_filtermutect_in replace a null/undefined segmentation with []), so
no samples are silently lost before GATK4_FILTERMUTECTCALLS.
This PR adds GATK4_CALCULATECONTAMINATION, GATK4_GETPILEUPSUMMARIES, and GATK4_LEARNREADORIENTATIONMODEL. These modules are in-line with best practices,especially if the submitted samples are FFPE (which are particularly sensitive to artifacts).