Supplementary Methods (Additional file 2) Reproducible command-line workflow to quantify organelle-derived reads Overview This workflow estimates the fraction of reads that map to either the chloroplast or mitochondrial genome (union), and splits organelle-like reads into: - cpDNA only: reads mapping to the chloroplast reference but not to the mitochondrial reference - mtDNA only: reads mapping to the mitochondrial reference but not to the chloroplast reference - cpDNA and mtDNA: reads mapping to both references (intersection) - cpDNA or mtDNA: reads mapping to either reference (union) Requirements - SRA Toolkit (prefetch, fasterq-dump) OR enaDataGet/asaspera as an alternative - BWA (bwa mem) - SAMtools - coreutils (awk, sort, comm) - GNU parallel (optional) Inputs - List of run accessions (one per line), e.g., runs.txt - Reference FASTA files: - cp.fa (Silene latifolia chloroplast genome; GenBank NC_016730) - mt.fa (Silene latifolia mitochondrial genome; GenBank NC_014487) 1) Prepare references bwa index cp.fa bwa index mt.fa 2) Download reads (example for one run) # Using SRA Toolkit (paired-end runs will produce *_1.fastq and *_2.fastq) prefetch SRR2050365 fasterq-dump --split-files --threads 8 SRR2050365 -O fastq/ 3) Map reads to organelle references # Single-end: bwa mem -t 8 cp.fa fastq/SRR2050365.fastq | samtools sort -@ 4 -o bam/SRR2050365.cp.bam bwa mem -t 8 mt.fa fastq/SRR2050365.fastq | samtools sort -@ 4 -o bam/SRR2050365.mt.bam # Paired-end: bwa mem -t 8 cp.fa fastq/SRR2050365_1.fastq fastq/SRR2050365_2.fastq | samtools sort -@ 4 -o bam/SRR2050365.cp.bam bwa mem -t 8 mt.fa fastq/SRR2050365_1.fastq fastq/SRR2050365_2.fastq | samtools sort -@ 4 -o bam/SRR2050365.mt.bam samtools index bam/SRR2050365.cp.bam samtools index bam/SRR2050365.mt.bam 4) Count total reads (N_total) # For FASTQ: # Single-end: N_total=$(awk 'END{print NR/4}' fastq/SRR2050365.fastq) # Paired-end (count read ends, consistent with SRA "reads"): N_total=$(($(awk 'END{print NR/4}' fastq/SRR2050365_1.fastq)+$(awk 'END{print NR/4}' fastq/SRR2050365_2.fastq))) 5) Extract mapped read IDs (flag != 4) and compute set operations # Mapped to chloroplast: samtools view -F 4 bam/SRR2050365.cp.bam | awk '{print $1}' | sort -u > ids/SRR2050365.cp.ids # Mapped to mitochondrion: samtools view -F 4 bam/SRR2050365.mt.bam | awk '{print $1}' | sort -u > ids/SRR2050365.mt.ids # Intersection (cpDNA and mtDNA): comm -12 ids/SRR2050365.cp.ids ids/SRR2050365.mt.ids > ids/SRR2050365.both.ids # cpDNA only: comm -23 ids/SRR2050365.cp.ids ids/SRR2050365.mt.ids > ids/SRR2050365.cp_only.ids # mtDNA only: comm -13 ids/SRR2050365.cp.ids ids/SRR2050365.mt.ids > ids/SRR2050365.mt_only.ids # Union (cpDNA or mtDNA): cat ids/SRR2050365.cp.ids ids/SRR2050365.mt.ids | sort -u > ids/SRR2050365.cp_or_mt.ids N_cp_only=$(wc -l < ids/SRR2050365.cp_only.ids) N_mt_only=$(wc -l < ids/SRR2050365.mt_only.ids) N_both=$(wc -l < ids/SRR2050365.both.ids) N_cp_or_mt=$(wc -l < ids/SRR2050365.cp_or_mt.ids) 6) Compute organelle-derived fraction Organelle_percent=$(python3 - <