One Health Viral Metagenomics for Pandemic Preparedness: Validated mNGS Workflows for Viral Detection and Genome Recovery from Swab and Tissue Specimens

Tristan Russell 1,2

Elisa Formiconi 1,2

Alison Murphy 3

Jimmy Hortion 1,2

Máire McElroy 4

Mícheál Casey 5

Laura Garza Cuartero 4

John F Mee 6

Hanne Jahns 7

Christine Kelly 1,8

Joanne Byrne 1,9

Eoin R Feeney 1,9

Patrick WG Mallon 1,2,9

Virginie W Gautier 1,2,3✉ Emailvirginie.gautier@ucd.ie

1 UCD Centre for Experimental Pathogen Host Research (CEPHR) University College Dublin D04 E1W1 Dublin Ireland

2 UCD School of Medicine University College Dublin D04 C7X2 Dublin Ireland

3 UCD Conway Institute University College Dublin D04 E1W1 Dublin Ireland

4 Department of Agriculture, Food, and the Marine Laboratories W23 VW2C Backweston, Celbridge, Kildare Ireland

5 Regional Veterinary Laboratories (RVL) Division, Department of Agriculture, Food and the Marine Agriculture House T12 XD51 Backweston, Dublin Ireland

6 Moorepark Research Centre, Animal and Bioscience Research Department P61 P302 Teagasc, Fermoy Ireland

UCD School of Veterinary Medicine University College Dublin Dublin 4 D04 W6F6 Ireland

8 Department of Infectious Diseases Mater Misericordiae University Hospital D07 AX57 Dublin Ireland

9 Department of Infectious Diseases St Vincent’s University Hospital D04 T6F4 Dublin Ireland

Tristan Russell^1,2, Elisa Formiconi^1,2, Alison Murphy³, Jimmy Hortion^1,2, Máire McElroy⁴, Mícheál Casey⁵, Laura Garza Cuartero⁴, John F Mee⁶, Hanne Jahns⁷, Christine Kelly^1,8, Joanne Byrne^1,9, Eoin R Feeney^1,9, Patrick WG Mallon^1,2,9, Virginie W Gautier^1,2,3

1) UCD Centre for Experimental Pathogen Host Research (CEPHR), University College Dublin, Dublin, D04 E1W1, Ireland.

2) UCD School of Medicine, University College Dublin, Dublin, D04 C7X2, Ireland.

3) UCD Conway Institute, University College Dublin, Dublin, D04 E1W1, Ireland.

4) Department of Agriculture, Food, and the Marine Laboratories, Backweston, Celbridge, Kildare, W23 VW2C, Ireland.

5) Regional Veterinary Laboratories (RVL) Division, Department of Agriculture, Food and the Marine, Agriculture House, Backweston, Dublin, T12 XD51, Ireland.

6) Teagasc, Moorepark Research Centre, Animal and Bioscience Research Department, Fermoy, P61 P302, Ireland.

7) UCD School of Veterinary Medicine, University College Dublin, Dublin 4, D04 W6F6, Ireland.

8) Department of Infectious Diseases, Mater Misericordiae University Hospital, Dublin, D07 AX57, Ireland.

Department of Infectious Diseases, St Vincent's University Hospital, Dublin, D04 T6F4, Ireland.

Corresponding Author: Virginie Gautier, virginie.gautier@ucd.ie

Abstract

Background

Metagenomic next-generation sequencing (mNGS) is an untargeted approach that enables detection of pathogens directly from samples without prior knowledge of their genetic sequences. In the context of pandemic preparedness and One Health surveillance, there is a pressing need for validated viral mNGS workflows that perform reliably across diverse hosts sample types and pre-analytical conditions.

Results

The study designed and evaluated two mNGS workflows, one for swabs and one for complex tissue matrices, using a reference repository of clinical and post-mortem samples. The panel comprised swabs and tissue samples positive for 19 DNA and RNA viruses (including 12 species) from nine host species and nine anatomical sites, encompassing a range of transport media, storage temperatures and processing timelines. Quality control metrics were embedded throughout nucleic acid extraction, library preparation and sequencing to monitor performance and support interpretation. Overall, 89.5% of 19 known DNA and RNA viruses were detected, including from samples with low nucleic acid concentrations (< 1 ng/µl) and variable integrity and purity. The workflows identified viral co-infections that had not been detected by prior targeted testing, as well as Phocid herpesvirus 7 (PHV7) for which no complete reference genome was initially available.

Conclusions

These results demonstrate that the validated swab and tissue mNGS workflows are sufficiently robust and sensitive for deployment in investigations of suspected viral disease of unknown aetiology and for early detection of emerging viral threats at the animal–human interface.

Keywords:

Metagenomic NGS

Virus Detection

Pandemic Preparedness

Introduction

Pandemic preparedness is a strategic priority for the European Union driven by the substantial morbidity and mortality associated with recent viral emergence events such as SARS-CoV-2 in humans and Avian influenza virus H5N1 in birds and cattle [1]. Zoonotic spillover between host species has underpinned the major pandemics of the last 50 years including those caused by SARS-CoV-2, HIV and Influenza H1N1 [2].

In response, the EU and World Health Organisation have adopted One Health strategies to pandemic preparedness and response frameworks, [1, 3]. Surveillance initiatives, such as One Health – ALL Ireland for European Surveillance (OH-ALLIES), are being established to build capacity for detection of high-risk viral families circulating in animal populations while also enhancing the ability to discover previously unknown viruses in Ireland [4]. Within this context, “Pathogen X” and “Pathogen Y” refer to hypothetical, unknown human and veterinary agents, respectively, with pandemic potential and potential to cause significant health and socio-economic disruption [4–6]. Viruses are considered the most plausible Pathogen X or Pathogen Y agents due to their rapid evolution, host switching capacity and ability to alter their virulence before recognition [7–10].

Viral metagenomic next-generation sequencing (mNGS) enables unbiased detection of both known and novel viruses directly from clinical samples addressing gaps in targeted molecular diagnostics, which rely on prior knowledge of pathogen sequences and are not optimum when viral genomes are highly divergent or co-infections and unexpected pathogens are present [11, 12].

mNGS has already made important contributions to the detection and characterisation of emerging viral pathogens, exemplified by the identification of Schmallenberg virus as the aetiological agent of large bovine and ovine abortion storms in Europe in 2011 [13], and the rapid characterisation of SARS-CoV-2 from the first COVID-19 cases in China [7–10], where early sequences facilitated PCR assays[14, 15] and vaccine design [16–18], phylogenetic analyses to track viral spread [19, 20], and inference of phenotype from genotype [9]. Co-infections can also be identified using mNGS [21]. These strengths distinguish mNGS from targeted methods, yet there are several challenges to mNGS-based pathogen discovery and surveillance. Practical concerns, especially in resource-limited settings, include its cost and requirements for specialist equipment, computing and personnel [22–24]. The issue of reduced analytical sensitivity relative to PCR due to low viral fraction in many samples and the need for careful interpretation to ensure specificity. Recently, technical advances in mNGS have made it more accessible[22–24] and various approaches, including methods of host depletion[25–27] and viral enrichment [28, 29], have been developed to improve the sensitivity of viral mNGS.

There is a particular need for rigorously validated, practical mNGS workflows that can operate reliably on the heterogeneous specimens encountered at the animal–human interface spanning both simple swab matrices and complex tissues. This study develops and validates two optimised mNGS workflows, one for swab samples and the other for complex tissue matrices (Fig. 1), each integrating comprehensive quality control (QC) measures, from nucleic acid extraction through library preparation and sequencing to bioinformatic analysis, to support robust virus discovery and characterisation. The methods detailed here cover the quality metrics, performance for detection of known and unknown viruses, and application of these workflows for consensus genome assembly, providing a framework for deployment in One Health surveillance and outbreak investigation.

Fig. 1

Workflows for mNGS of clinical samples. The approaches for swab and tissue samples are shown [26]. The swab workflow has been designed to detect whole virus particles, so DNA and RNA were analysed by mNGS. A transcriptomics approach has been applied for tissue samples because actively replicating virus that generates transcripts would be expected. The schematic indicates the critical QC metrics during the workflow, included knowledge of freeze-thaw cycles and storage temperatures and when nucleic acid concentration, purity and integrity is assessed using the Qubit, Nanodrop and Bioanalyzer, respectively. Approximate time to run each step and overall costs per sample are provided. Created with BioRender.

Methods

Reference Biospecimen Repository

A biospecimen repository of clinical samples positive for known viruses was assembled from five independent sources: the Department of Agriculture, Food and the Marine (DAFM), the Agriculture and Food Development Authority (Teagasc), UCD School of Veterinary Medicine, St Vincent’s University Hospital (SVUH) and Mater Misericordiae University Hospital (MMUH). Eight swab and six tissue samples representing nine anatomical sites, 12 viral species and nine host species were included for method development and evaluation (Table 1). All samples from non-human animals were obtained from naturally deceased animals during routine postmortem examinations without the use of anaesthesia or euthanasia.

Swab specimens were collected using flocked swabs and a range of transport and storage conditions. DAFM collected swabs into universal transport medium (COPAN, 330C), Teagasc collected swabs (COPAN, 552C) into PrimeStore molecular transport media containing guanidine thiocyanate lysis buffer (Thermo Fisher Scientific, R13905), while SVUH and MMUH collected dry swabs (COPAN, 552C) without transport medium. SVUH and MMUH samples were extracted within 2 days of collection to avoid freeze-thaw cycles. DAFM and Teagasc samples were stored at -80°C and − 20°C, respectively, until extraction (Table 1).

Tissue biopsies were collected during routine postmortem examinations. Grey seal (Halichoerus grypus) carcasses processed by UCD School of Veterinary Medicine were stored at -20°C prior to necropsies after which tissue biopsies were stored at -80°C before processing. Sika deer (Cervus nippon) and Asian elephant (Elephas maximus) tissues collected from fresh animals by UCD Veterinary Medicine were similarly stored at -80°C before processing. DAFM tissue samples were stored at -80°C and had undergone at least one freeze-thaw cycle before extraction (Table 1).

Ethical and regulatory oversight was secured through institutional review processes.

The UCD Animal Research Ethics Committee granted exemptions for the use of samples collected as part of routine diagnostics or postmortem from DAFM (AREC-E-24-39-Gautier), Teagasc (AREC-E-25-25-Gautier) and UCD School of Veterinary Medicine (AREC-E-22-04-Jahns), and the Human Research Ethics Committee provided approval to work with human clinical samples provided by hospitals (306-LS-CSD-25-Mallon).

Table 1

Swab and tissue biospecimens used for viral mNGS workflow validation. Abbreviations: EEHV1A, Elephant endotheliotropic herpesvirus; IBV, Infectious bronchitis virus; MDV, Marek’s Disease virus; OHV2, Ovine herpesvirus 2; MTM, Molecular Transport Media; PHV1/7, Phocine herpesvirus 1/7; PMV1, Pigeon paramyxovirus 1; SBV, Schmallenberg virus; and UTM, Universal Transport Media. ¹Fetal abomasal fluid comprises swallowed amniotic fluid and gastric secretions – the abomasum is the fourth stomach of a ruminant. ² Sample storage prior to extraction, and in some cases, storage of carcasses before postmortem. ³ Stored at -20°C when collected and transferred to -80°C on receipt. ⁴ Detected by end-point PCR and confirmed by Sanger sequencing.
Sample	Matrix	Host	Anatomical Site	Source	Date of Sampling	Available Details on Animal Condition & Pathology	Storage (°C)²	Freeze-thaw cycles	Known Virus (Genome)	Detection
1	Swab (MTM)	Bos taurus	Abomasal fluid¹	Teagasc	02/03/2025	Autolysed Aborted Foetus	-20 & -80³	1	SBV (RNA)	qPCR (CT = 32)
2	Swab (UTM)	Sus scrofa	Large Intestine	DAFM	2015	Dead, Diarrhoea	-80	≥ 1	Rotavirus A (RNA)	qPCR (CT = 27)
2	Swab (UTM)	Sus scrofa	Large Intestine	DAFM	2015	Dead, Diarrhoea	-80	≥ 1	Rotavirus B (RNA)	qPCR (CT = 31)
3	Swab (UTM)	Sus scrofa	Small Intestine	DAFM	2015	Dead, Diarrhoea	-80	≥ 1	Rotavirus A (RNA)	qPCR (CT = 7)
									Rotavirus B (RNA)	qPCR (CT = 17)
									Rotavirus C (RNA)	qPCR (CT = 15)
4	Swab (UTM)	Sus scrofa	Faecal	DAFM	2015	Dead, Diarrhoea	-80	≥ 1	Rotavirus A (RNA)	qPCR (CT = 23)
4	Swab (UTM)	Sus scrofa	Faecal	DAFM	2015	Dead, Diarrhoea	-80	≥ 1	Rotavirus C (RNA)	qPCR (CT = 22)
5	Swab (Dry)	Homo sapiens	Skin	SVUH	03/2025	Live, Skin Poxes	Processed on arrival	0	Mpox IIb (DNA)	qPCR (CT = 27)
6	Swab (Dry)	Homo sapiens	Skin	MMUH	07/02/2025	Live, Skin Poxes	Processed on arrival	0	Mpox Ia (DNA)	qPCR (CT = 24)
7	Swab (Dry)	Homo sapiens	Skin	SVUH	09/2022	Live, Skin Poxes	Processed on arrival	0	Mpox IIb (DNA)	qPCR (CT=)
8	Swab (UTM)	Gallus gallus	Trachea	DAFM	22/04/2024	Dead	-80	≥ 1	MDV (DNA)	qPCR (CT = 33)
9	Tissue	Elephas maximus	Heart	UCD Vet	24/07/2024	Dead, systemic haemorrhages, Intestinal ulcers	-20 & -80³	1	EEHV1A (DNA)	PCR + ⁴
10	Tissue	Cervus nippon	Liver	UCD Vet	19/09/2024	Dead, systemic vasculitis	-20 & -80³	1	OHV2 (DNA)	PCR + ⁴
11	Tissue	Columba livia	Brain	DAFM	01/07/2020	Dead	-80	≥ 1	PMV1 (RNA)	qPCR (CT = 30)
12	Tissue	Gallus gallus	Intestine	DAFM	25/01/2024	Dead	-80	≥ 1	IBV (RNA)	qPCR (CT = 26)
13	Tissue	Halichoerus grypus	Brain	UCD Vet	08/11/2022	Dead, stranded, mouth ulcers, septicaemia, umbilical abscess	-20 & -80³	2	PHV1 (DNA)	PCR + ⁴
14	Tissue	Halichoerus grypus	Gingiva	UCD Vet	17/03/2024	Dead, Stranded, Pneumonia, Septicaemia	-20 & -80³	2	PHV7 (DNA)	PCR + ⁴

Biosafety and Risk Assessment

All work with infectious material was conducted under appropriate biocontainment with risk assessment determining assignment to Biosafety Level 2 or 3 facilities according to national guidelines. Mpox-positive clinical swabs were handled and processed within a dedicated Biosafety Level 3 laboratory while all remaining animal and human diagnostic or postmortem specimens were manipulated under Biosafety Level 2 conditions using standard operating procedures and engineering controls to minimise exposure risk and prevent environmental release

Total Nucleic Acid (TNA) Extraction from Swabs

Dry Mpox-positive swabs were processed by adding 850 µl Buffer AVL, incubating for 10 minutes, and using 700 µl lysate for TNA extraction with the QIAamp Viral RNA Mini Kit (Qiagen, 52906) following the manufacturer’s instructions with the sole modification of replacing 6 µl carrier RNA with 6 µl linear acrylamide (5 mg/ml, Thermo Fisher Scientific; AM9520). Linear acrylamide was used instead of carrier RNA to prevent sequencing of carrier RNA. All other swabs were processed by vortexing for 5 seconds then 300 µl transport media was used for TNA extraction with the Liferiver Viral RNA Isolation Kit (P20211009) on the Liferiver automated extractor platform, following the manufacturer’s instructions except that 6 µl linear acrylamide (5 mg/ml) was substituted for the carrier RNA.

RNA Extraction from Tissues

RNA was extracted from tissue samples using the RNeasy Mini Kit (Qiagen, 74106). Tissue was cut on dry-ice into 10–20 mg pieces and transferred into 600 µl Buffer RLT supplemented with 10% β-mercaptoethanol. Samples were sonicated on the high setting of the Biorupter NextGen sonicator (diagenode) for three cycles of 30 second ON and 30 second OFF at 4°C followed by column-based homogenisation (BioTech, HCR003) at 14,000 x g for 120 seconds. RNA was purified using the Qiagen RNeasy Mini Kit and residual DNA was digested with the DNA-free DNA Removal Kit (Thermo Fisher Scientific, AM1906) according to manufacturer’s instructions.

Quality Control for Nucleic Acid Extracts

Nucleic acid extract purity and quantity was assessed using the ND-1000 NanoDrop spectrometer (Labtech International). RNA concentrations were determined using the Qubit RNA High Sensitivity RNA Kit (Thermo Fisher Scientific, Q32852) and DNA concentrations using the Qubit DNA High Sensitivity Kits (Thermo Fisher Scientific, Q33230). RNA from tissue extracts was assessed using the Bioanalyzer RNA Nano chip (Agilent, 5067 − 1512) or the Tapestation RNA ScreenTape (Agilent, 5067–5579). Swab extracts with concentrations below the limit of detection of these platforms were not subjected to integrity assessment.

Double-stranded cDNA Synthesis

First-strand cDNA synthesis was performed using SuperScript IV (Thermo Fisher Scientific, 18090050) by combining 11 µl TNA extract with 1 µl 10 mM deoxynucleotides and 1 µl 50 ng/µl random hexamers (Thermo Fisher Scientific, 51709), then incubating at 65°C for 5 minutes. Then, 4 µl 5X SSIV Buffer, 1 µl 100 mM DTT, 1 µl RNase OUT (Thermo Fisher Scientific, 100000840) and 1 µl SuperScript IV enzyme were added to reaction mixes, which were incubated at 23°C for 10 minutes, 50°C for 30 minutes and 80°C for 10 minutes.

Second-strand cDNA synthesis was carried out using the Klenow Fragment (Thermo Fisher Scientific, EP0051) by addition of 0.5 µl 10 U per 1 µl Klenow Fragment, 1 µl 10 mM deoxynucleotides, 5 µl Klenow Fragment Buffer and 33.5 µl nuclease-free water to first-strand cDNA synthesis reaction mixes. The reaction was incubated at 37°C for 30 minutes, then heat inactivated at 80°C for 5 minutes. The final product contained genomic DNA and ds-cDNA.

cDNA Library Preparation

cDNA libraries were generated from 500 pg to 100 ng of RNA using the SMART-Seq Total RNA Pico Input with ZapR (Mammalian) rRNA Depletion Kit (Takara Biosciences, 634357) following the manufacturer’s instructions. Fragmentation times were adapted to RNA integrity metrics: 4 minutes for RIN > 7, 3 minutes for RIN 5–7, 2 minutes for RIN 4–5 with DV200 > 50%, and no fragmentation for RIN < 4 with DV200 30–50%. RNA extracts below the limit of detection of the Bioanalyzer and Tapestation were fragmented for 2 minutes. Following first-strand cDNA synthesis, five PCR cycles were used in first amplification to incorporate adapters and unique dual indexes (Takara Biosciences, 634756), and all purification steps employed NucleoMag NGS Cleanup and Size Select beads (Macherey-Nagel, 744970.50) at a bead:sample ratio of 0.8. Ribosomal RNA was depleted using the ZapR probes provided with the SMART-Seq Kit. A second-round PCR amplification was run for 14 cycles then a final library purification with a bead:sample ratio of 1.0.

DNA Library Preparation

DNA libraries were generated from 50 pg to 50 ng of DNA (genomic DNA plus ds-cDNA) using the ThruPLEX DNA-Seq Kit (Takara Biosciences, R400674) according to the manufacturer’s instructions. Sequencing adapters and unique dual indexes were incorporated using PCR with tailed primers. Cycle numbers were adjusted based on DNA input: 8 cycles for 30–50 ng, 9 cycles for 10–30 ng, 10 cycles for 3–10 ng, 11 cycles for 1.5-3 ng, 12 cycles for 0.5–1.5 ng, 14 cycles for 0.2–0.5 ng and 16 cycles for 0.05–0.2 ng. Final DNA libraries were purified using bead:sample ratio of 1.0.

Libraries Quality Control

Library fragment size distribution was assessed using either the Bioanalyzer DNA High Sensitivity Chip (Agilent, 5067 − 4626) or the DNA D1000 TapeStation (Agilent, 5067–5584). Libraries were quantified with the Qubit DNA Broad Range Kit (Thermo Fisher Scientific, Q33260) or Qubit DNA High Sensitivity Kits (Thermo Fisher Scientific, Q33230).

Illumina Next-Generation Sequencing

Next-Generation Sequencing was performed on the Illumina NextSeq 2000 platform using P2 Cartridges with SBS-XLEAP chemistry and 2X 150 bp paired-end reads (Illumina, 20100985). Libraries were loaded at 600 pM and spiked with 8% PhiX reference genome (Illumina; FC-110-3002). Resulting FASTQ files were deposited on the Sequence Read Archive (Supplementary Table S1) under BioProject PRJNA1371775, except for datasets derived from human clinical samples, which were excluded in accordance with ethical approval conditions.

Taxonomic Identification with CZID

Metagenomic reads were processed using the Chan Zuckerberg ID (CZID) web-based pipeline (version 8.3) for quality control (QC) and taxonomic assignment [30]. Initial QC steps included trimming low-quality base calls (< Q20) and removing reads with poor quality or low sequence complexity using Trimmomatic [31] and Price [30]. Where appropriate host genomes were available, host-derived reads were filtered by mapping against the host genome using Bowtie2 [32] and samtools [33]. Minimap2 [34] and Diamond [35] were used to perform nucleotide and protein searches against NCBI NT and NR databases (databases from 06/02/2024), respectively, followed by de novo assembly of classified reads into continuous sequences (contigs) with SPAdes [36]. Bowtie2 [32] was then used to map reads back to contigs to determine depth. Contigs were then searched against the NCBI NT and NR databases using BLASTN and BLASTX [37], respectively to refine taxonomic assignements. The resulting output was a table containing a list of taxonomic matches with associated read and contig counts for the NT and NR databases.

Genome Assembly

For viruses with suitable reference genomes, assemblies were generated by mapping reads to the closest reference sequence using Minimap2 [34]. Read depth was determined using mosdepth [38]. Consensus genomes were generated using samtools and bcftools [33]. Consensus genomes meeting predefined quality threshold (≥ 60% genome coverage and mean read depth ≥ 10X [39]) were annotated using VAPiD[40] and deposited on GenBank (Supplementary Table S2). For viruses lacking a close reference genome, an alternative workflow was applied: overlapping reads were first assembled de novo into contigs using SPAdes[36] passing the “—meta” flag. Diamond[35] was then used to translate reads into the six open reading frames and perform BLASTX[37] searches of contigs against the NCBI NR database. A reference sequence with full-genome assembly and consistently one of the best matches from the BLASTX search was selected for carrying out a reference-based TBLASTX[37] search to identify the coordinates along the genome where contigs mapped. Contig nucleotide sequences were then mapped back to these coordinates to generate a consensus sequence. “N” was assigned to positions with no coverage.

Results

Quality Control Overview

mNGS workflows deployable for passive or active surveillance of DNA and RNA viruses derived from a range of sample matrices, host species and anatomical sites were designed and tested. The mNGS workflows developed within OH-ALLIES were systematically quality-controlled from sample acquisition through nucleic acid extraction, library preparation and sequencing to ensure robust viral detection across diverse relevant specimens (Fig. 1).

At each step of the mNGS workflow, quantitative and qualitative QC metrics were recorded to monitor process performance, support troubleshooting and document assay robustness for pathogen discovery applications.

Sample Acquisition

To reflect realistic conditions of an outbreak scenario, where the source of a novel pathogen is uncertain and sample quality, handling and storage are often suboptimal, mNGS workflows were tested with 14 clinical specimens from the OH-ALLIES Reference Biospecimen Repository including eight swabs and six tissue biopsies collected from nine host species and nine anatomical sites (Table 1). Across this panel, targeted molecular testing had identified 12 viral species belonging to four RNA virus families (Coronaviridae, Paramyxoviridae, Peribunyaviridae and Sedoreoviridae) and two DNA virus families (Orthoherpesviridae and Poxviridae), providing a diverse benchmark for mNGS performance (Table 1).

Sample handling and storage conditions were deliberately heterogenous, reflecting limited control over logistics during an emerging infectious event (Table 1). Samples had been stored at -20°C and − 80°C, in some cases for up to 10 years, and experienced between zero and at least two pre-extraction freeze-thaw events (Supplementary Table S3). Despite these constraints, known viruses remained detectable by mNGS from Samples 13 and 14 following repeated freeze-thaw cycles and prolonged periods of storage at -20°C, indicating that the workflows retained sensitivity under suboptimal pre-analytical conditions (Supplementary Table S3).

Nucleic Acid Extraction

Extracts from swabs and tissues encompassed a wide range of DNA and RNA concentrations (below the limit of detection to high nanogram per microliter levels), purity ratios (260/280 and 260/230) and RNA integrity metrics (RIN and DV200) (Supplementary Table S4). These measurements were used to adjust library preparation parameters, with nucleic acid concentrations informing the number of PCR cycles and RIN/DV200 guiding fragmentation times. Known viral targets, including SBV (sample1), Mpox IIb (Sample 5) and MDV (Sample 8), were consistently detected in extracts with low nucleic acid concentration, suboptimal purity and reduced RNA integrity (Supplementary Table S4). Overall, there was no apparent association between nucleic acid extract concentration or purity and qualitative viral detection by mNGS, demonstrating the robustness of workflows and their tolerance to degraded or impure input material (Fig. 2A).

Fig. 2

Detection of known viruses by mNGS relative to extract and library measurements. A) Detection of known viruses relative to extract concentration (y-axis) and purity (x-axis). The 260/280 range of pure nucleic acid (1.8–2.2) is indicated by the dashed vertical lines. B) Detection of known viruses relative to library concentration (y-axis) and fragment length (x-axis). The recommended fragment length ranges for cDNA and DNA libraries are indicated with dashed vertical lines.

Library Preparation

Library QC focused on concentration and fragment length distribution as both parameters impact data quality and cluster generation on Illumina NGS instruments. All libraries fell within the expected range of fragment length window, (150–500 bp for cDNA libraries and 300–600 bp for DNA libraries) (Fig. 2B), indicating that integrity-guided fragmentation settings were appropriate for cDNA libraries across the tested extract qualities.

Libraries meeting or exceeding the minimum loading concentration (> 600 pM) were generated even from extracts with DNA and RNA concentrations below the limits of detection of high sensitivity fluorometric assays, validating the selected library preparation technologies for low input application. In some instances, library concentrations remained below the detection limit of detection, yet the corresponding datasets still yielded qualitative detection of the expected viruses.

NGS Run Performance

Sequencing was carried out on an Illumina NextSeq 2000 instrument using the SBS-XLEAP chemistry, selected for their combined capacity to deliver high throughput and high-quality short read data suited for pathogen discovery applications. Across runs, core instrument metrics such as yield, cluster occupancy and percentage of bases above Q30 exceeded manufacturer specifications (Supplementary Table S5), consistent with high-quality library preparation and optimal run set up.

A target depth of 50 million reads per sample was applied to accommodate the typically low proportion of viral nucleic acid within total sequence output. For swab samples, this target was distributed equally between corresponding DNA cDNA libraries, with six of eight DNA libraries and five of six cDNA libraries exceeding these targets (Table 2). For tissue samples, 50 million reads per cDNA library was the target and this was achieved for five of six libraries (Table 2). Reads passing filters (post-run QC) and host filtering varied widely between libraries reflecting differences in sample composition and host background (Table 2). Both samples in which the known viruses were not detected by mNGS exceeded 50 million reads with relatively high proportions of passing-filter rates indicating that these metrics alone did not explain occasional false negatives.

Table 2

Library-specific metrics for mNGS analysis obtained using CZID and qualitative detection by mNGS. Reads passing filters are those retained following QC and host filtering.
Sample	Run	Library Type	Total Reads	Reads Passing Filters	Detection by mNGS
9	1	cDNA	150,000,000	9,968,190 (6.65%)	Yes
10	1	cDNA	141,458,898	11,233,174 (35.76%)	Yes
11	1	cDNA	88,911,054	4,896,870 (3.46%)	No
12	1	cDNA	150,000,000	2,475,474 (2.22%)	Yes
13	1	cDNA	111,705,698	4,357,062 (4.9%)	Yes
14	1	cDNA	31,411,964	3,007,886 (2.01%)	Yes
6	2	cDNA	139,326,058	2,290,160 (1.64%)	Yes
6	2	DNA	150,000,000	3,881,426 (2.59%)	Yes
7	2	DNA	60,006,358	111,800 (0.19%)	Yes
8	2	DNA	150,000,000	2,367,062 (1.58%)	Yes
1	2	DNA	4,969,268	3,987,534 (80.24%)	No
1	3	cDNA	8,153,070	3,877,596 (47.56%)	Yes
2	3	cDNA	88,164,752	15,326,808 (17.38%)	1 of 2
2	3	DNA	46,143,002	2,231,588 (4.84%)	No*
3	3	cDNA	150,000,000	6,399,876 (4.27%)	Yes
3	3	DNA	41,962,422	2,252,322 (5.37%)	1 of 3
4	3	cDNA	110,879,828	10,615,516 (9.57%)	1 of 2
4	3	DNA	71,164,868	3,439,184 (4.83%)	1 of 2
5	3	cDNA	142,868,726	1,246,670 (66.97%)	Yes
5	3	DNA	1,861,622	9,592,780 (6.71%)	Yes

Overall Performance

Comprehensive metadata records captured sample matrix, host, anatomical site, pre-analytical handling extracts characteristics, library QC steps and run performance, confirming the workflows were stress tested across a broad spectrum of realistic conditions. Across this range qualitative detection of known viruses was achieved from low- and high-quality extracts and libraries providing evidence that mNGS workflows are robust for viral pathogen identification in heterogenous outbreak scenarios.

Identification of Known Viruses

To develop an NGS-based tool capable of detecting viral causes of infectious disease with unknown aetiology, swab and tissue-specific workflows were evaluated using clinical specimens known to contain diverse viruses. Potential targets include re-emerging viruses with available reference sequences that may escape targeted molecular assays due to phenotypic shifts or sequence variation in regions targeted by PCR, and truly emerging viruses for which reference sequences might not be available. The latter is covered in the “Identification of Unknown Viruses” section and includes PHV7, for which no complete reference genome was initially available.

For viruses with reference sequences, untargeted shotgun NGS data were generated and analysed using the CZID metagenomics pipeline which aligns non-host reads against the GenBank nucleotide (NT) database for taxonomic identification (Table 3). Only reads specific to the virus taxa were considered and a minimum of two unique reads mapping to distinct genomic loci was required to call a detected virus a true hit consistent with pathogen detection practices [41]. Virus identification was subsequently verified by mapping reads to appropriate references to assemble consensus genomes as outlined in the “Consensus Genome Assembly” section, providing sequence-level confirmation and enabling downstream characterisation.

Table 3

Detection of known viruses. Output from the CZID Metagenomics workflow including number of reads matching known virus, proportion of reads matching known virus (reads per million) and de novo assembled contigs. ¹ The PHV7 reference sequence was unavailable in the GenBank NT database (06/02/2024).
Sample	Known Virus	Library	CZID Metagenomics Workflow
Sample	Known Virus	Library	Unique Reads	Reads Per Million	Contigs
1	SBV	cDNA	36	4.4	0
1	SBV	DNA	0	0	0
2	Rotavirus A	cDNA	2	0.1	0
	Rotavirus A	DNA	0	0	0
	Rotavirus B	cDNA	0	0	0
	Rotavirus B	DNA	0	0	0
3	Rotavirus A	cDNA	1,735,102	249,386.5	76
	Rotavirus A	DNA	18	7.2	0
	Rotavirus B	cDNA	78	11.2	1
	Rotavirus B	DNA	0	0	0
	Rotavirus C	cDNA	44,690	6423.3	17
	Rotavirus C	DNA	0	0	0
4	Rotavirus A	cDNA	0	0	0
	Rotavirus A	DNA	42	11.2	0
	Rotavirus C	cDNA	2	0.2	0
	Rotavirus C	DNA	0	0	0
5	Mpox IIb	cDNA	1331	715.0	1
5	Mpox IIb	DNA	333,126	2,332	107
6	Mpox Ia	cDNA	21	4.5	0
6	Mpox Ia	DNA	201	11.2	6
7	Mpox IIb	DNA	35,295	588.3	23
8	MDV	DNA	2	0.1	0
9	EEHV1A	cDNA	11,394	415.8	150
10	OHV2	cDNA	26	1.7	0
11	PMV1	cDNA	0	0	0
12	IBV	cDNA	10	3	0
13	PHV1	cDNA	2	0.4	0
14	PHV7¹	cDNA	0	0	0

Known Virus Detection from Swab Specimens

For swab specimens, the strategy targeted total nucleic acid to capture both DNA and RNA virus present as whole particles, and therefore both DNA and cDNA libraries were analysed. Across eight swab samples tested containing 12 known viruses, 11 were detected by mNGS with increased qualitative and quantitative detection of RNA and DNA viruses from cDNA and DNA libraries, respectively. Six RNA viruses were detected in cDNA libraries compared to two in DNA libraries and, of the two DNA viruses for which both DNA and cDNA libraries were analysed, there were 333,327 matching reads in DNA libraries compared to 1352 reads in cDNA libraries. One RNA virus (Rotavirus A in Sample 5) was only detected by mNGS analysis of the DNA library. Overall, the increased detection of RNA and DNA viruses from cDNA and DNA libraries, respectively, supports the strategy of dual DNA/RNA analysis for swab-based viral surveillance.

Known Virus Detection from Tissue Specimens

For tissue samples, a transcriptomics approach was prioritised on the premise that actively replicating viruses would generate viral transcripts in affected organs. Reference sequences for five known tissue-associated viruses were present in the GenBank NT database and four were detected using mNGS, including three herpesviruses and one coronavirus, demonstrating the detection of viral transcripts from DNA and RNA viruses using this workflow (Table 3). Successful detection of EEHV1A in heart tissue from an elephant with systemic haemorrhages illustrates how pathological examinations can guide optimal samples selection by targeting organs with macroscopically evident lesions for mNGS analysis (Supplementary Figure S1).

Overall, the 14 specimens used in this evaluation contained 17 known viruses of which, 15 (88.2%) were identified including 11 of 12 in swab samples and 4 of 5 in tissue samples (Table 3). The detection rates support the distinct strategies of analysing DNA and RNA for detection of whole virus particles from swab samples and the transcriptomics approach for detecting actively replicating viruses from tissues.

Identification of Unknown Viruses

mNGS workflows must be capable of detecting emerging and re-emerging viruses, where there is limited or no prior knowledge of the infectious agent. This spectrum includes “unknown knowns” where reference sequences exist, but the virus has not been identified in a sample; “known unknowns” where a virus is known to be present, but it lacks a complete reference genome; and “unknown unknown”, for which neither prior detection nor reference sequences are available. In this study, secondary infections represented unknown knowns, PHV7 served as an example of known unknown, and the workflows were not explicitly challenged with unknown unknowns.

Identification of Unknown Knowns (Co-Infections)

Co-infections are representative of known unknowns because, although genetic sequences of these viruses are available, their presence in samples was previously unknown. 12 secondary infections were identified across 5 samples (Table 4) applying the criteria of at least 2 unique reads mapping to genetic references. Seven enteric viral pathogens were identified in Sample 3 with high read counts matching their respective sequences. Three of these enteric viruses were also observed in Sample 4, although Sapelovirus A was deemed a false-positive because all reads mapped to a single genomic locus. Molluscum contagiosum virus, Epstein Barr virus (EBV) and Herpes simplex virus 2 (HSV2) were identified as potential co-infections in Samples 5, 6 and 7, respectively, with Molluscum contagiosum virus supported by 1676 reads compared with 44 and 8 reads for HSV2 and EBV, respectively.

Table 4

Viral co-infection. Metagenomic workflow details were obtained through analysis with the CZID metagenomics pipeline.
Sample	Virus	Library	Metagenomics Workflow
Sample	Virus	Library	Mapped Reads	Reads Per Million	Contigs
3	Porcine astrovius 4	cDNA	352,724	50,697.1	36
	Porcine astrovius 4	DNA	469	188.6	6
	Sapporo virus	cDNA	188,694	27,121.0	6
	Sapporo virus	DNA	820	329.8	3
	Sapelovirus A	cDNA	31,774	4566.9	21
	Sapelovirus A	DNA	69	27.8	3
	Aichivirus C	cDNA	25,729	3698.0	2
	Aichivirus C	DNA	877	352.7	9
	Porcine torovirus	cDNA	18,412	2646.4	6
	Porcine torovirus	DNA	76	30.6	3
	Enterovirus G	cDNA	1424	204.7	18
	Enterovirus G	DNA	2	0.8	0
	Teschovirus A	cDNA	1158	166.4	11
	Teschovirus A	DNA	6	2.4	0
4	Aichivirus C	DNA	26	6.9	1
4	Sapporo virus	DNA	2	0.5	0
5	Molluscum contagiosum virus	DNA	1676	11.7	2
6	EBV	DNA	8	0.4	0
7	HSV2	DNA	44	0.7	1

Identification of Known Unknowns (PHV7)

Known unknowns had previously been identified in samples, but complete genome reference sequences were unavailable in the NT database when analysis was carried out. PHV7 exemplified this category because panherpesvirus PCR and Sanger sequencing had confirmed PHV7 presence, yet its complete genome sequence was unavailable. Viruses diverge more at the nucleotide level and share more similarity at the amino acid level. As such, an amino acid search was carried out to identify PHV7 by translating reads in all six open reading frames and searching translated sequences against the NR database. Hits to members of the Gammaherpesvirus genus in the NR database that did not match to the NT database, were interpreted as specific evidence for PHV7.

PHV7 was not identified when mNGS reads were searched against the NT database, although 18 reads mapped to other species belonging to the Gammaherpesvirus genus. In contrast, BLASTX searches against the NR database identified 2,625 reads matching the Gammaherpesvirus genus proteins. A partial consensus genome sequence was assembled from the metatranscriptomic data, which covered approximately 4.17% of the PHV7 genome (Supplementary Figure S2).

Conclusion

These mNGS workflows are intended for deployment in scenarios where prior knowledge of circulating viruses and reference sequences may be incomplete or absent. Detection of 12 secondary infections demonstrates the capacity to identify unknown knowns, while recovery of Gammaherpesvirus genus sequences corresponding to PHV7 illustrate that the mNGS workflows can also detect and partially characterise known unknowns.

Consensus Genome Assembly

Compared to targeted approaches such as PCR and ELISA, mNGS generates rich sequence data, which can be used to assemble partial or complete consensus genomes. These assemblies support the design of targeted diagnostics and mRNA vaccines, enable phylogenetic analysis to track transmission and evolution, and allow inference of phenotypic features from genotype. Consensus genomes were also used to verify viruses identified by mNGS by assessing read distribution across reference genomes. Consensus genomes were assembled by mapping reads to reference sequences with Minimap2 [34], generating consensus genomes with samtools and bcftools [33] and measuring read depth with mosdepth [38] (Table 5). Typically more reads mapped to viral sequences during this targeted re-alignment step (Table 5) than were counted by CZID metagenomic pipeline (Table 4), reflecting inclusion of non-taxonomically informative reads and more efficient read alignment when the reference genome is provided.

Genome Assembly for Verification of Virus Hits

Genome-wide coverage profiles were used to distinguish true viral detections from false positive. For most non-segmented virus, coverage plots showed reads mapped to multiple regions of the genome consistent with genuine infections (Fig. 3). In contrast, all Sapelovirus A reads from sample 4, mapped to a single location along its reference genome (Fig. 3), failing the criterion of at least 2 unique reads mapping to distinct positions, and this signal was therefore classified as false positive.

For the segmented genomes – SBV and Rotavirus – genomes were assembled using segment-specific reference sequences (Supplementary Table S6) and coverage plots demonstrated mapping of reads across multiple segments for each Rotavirus (Fig. 4), supporting their classification as true positives. Although reads only matched to the L segment of SBV, reads mapped to multiple loci of this segment (Fig. 4).

Table 5

1X consensus genomes. Viruses detected with the metagenomic workflow were mapped against reference sequences using Minimap2. The total genome metrics are shown for segmented viruses with a detailed breakdown between segments in Supplementary Table S5.
Sample	Known Virus	Library	Consensus Genome
Sample	Known Virus	Library	Ref. Accession	No. of Reads	1X Coverage (%)	Mean Depth
1	SBV	cDNA	A Table S5	36	1.7	0.44
2	Rotavirus A	cDNA	Table S5	18	3.4	0.09
3	Rotavirus A	cDNA	Table S5	24,916,386	85.3	104,903.8
	Rotavirus A	DNA	Table S5	133	14.59	0.65
	Rotavirus B	cDNA	Table S5	362	4.1	4.99
	Rotavirus C	cDNA	Table S5	22,446,004	80.5	95,253.5
	Rotavirus C	DNA	Table S5	341	16.7	2.19
	Rotavirus C	cDNA	Table S5	5	1.5	0.02
	Porcine astrovirus 4	cDNA	KX060808.1	1,463,725	90.2	30,307.1
	Sapporo virus	cDNA	MK962340.1	818,991	93.2	15,239.8
	Sapelovirus A	cDNA	MN836683.1	230,101	92.4	4138.12
	Aichivirus C	cDNA	LC210609.1	342,572	100	5861.0
	Porcine torovirus	cDNA	LT900503.1	429,422	98.2	2161.81
	Enterovirus G	cDNA	MF782664.1	9514	83.4	163.1
	Teschovirus A	cDNA	JQ429405.1	6715	62.9	127.08
4	Rotavirus A	DNA	Table S5	341	16.7	2.19
	Rotavirus C	cDNA	Table S5	5	1.5	0.02
	Aichivirus C	DNA	LC210609.1	204	58.5	3.3
	Sapelovirus A	DNA	MN836683.1	6	3.8	0.11
	Sapporo virus	DNA	MK962340.1	22	12.6	0.38
5	Mpox IIb	cDNA	LC852831.1	2438	0.94	1.21
	Mpox IIb	DNA	LC852831.1	371,820	93.4	261.5
	Molluscum contagiosum virus	DNA	MH320554.1	366,593	4.44	60.8
6	Mpox Ia	cDNA	LC852831.1	1036	18.19	0.66
	Mpox Ia	DNA	OZ254457.1	4234	52.9	1.8
	EBV	DNA	NC_007605.1	243	1.54	0.066
7	Mpox IIb	DNA	LC852831.1	39,631	99.9	23.2
7	HSV2	DNA	KY922721.1	3225	2.91	0.67
8	MDV	DNA	NC_075702.1	348,573	1.2	190.5
9	EEHV1A	cDNA	KC618527.1	6,666,011	77.5	312.3
10	OHV2	cDNA	PV231823.1	869	6.8	0.46
12	IBV	cDNA	ON350837.1	27,549	21.8	31.1
13	PHV1	cDNA	OK032545.1	103	4.4	0.095

Fig. 3

Coverage plots of non-segmented viruses detected by mNGS. Reads were aligned to references using Minimap2 and depth of coverage was determined using mosdepth.

Fig. 4

Coverage plots of segmented viruses detected by mNGS. Reads were aligned to references of each segment using Minimap2 and depth of coverage was determined using mosdepth.

Genome Assembly for Annotation

High-quality genomes suitable for downstream analyses were those with at least 75% genome coverage at 1x and mean read depth of at least 10 (Table 5) [39]. For nine non-segmented viruses meeting these criteria, coverage plots showed uniformly high depth across the genome length, except Enterovirus G from Sample 3, which had several gaps and areas of low coverage (Fig. 5). All other consensus genomes, which had at least 60% coverage at ≥ 10X read depth (the reduced coverage from 75% to 60% was caused by increased read depth requirements), were annotated with VAPiD [40] and uploaded to GenBank (Supplementary Table 2).

For segmented viruses, coverage plots revealed uneven depth and coverage among segments (Fig. 6). Consensus genomes were annotated with VAPiD [40] and submitted to GenBank for segments with at least 60% coverage at ≥ 10X read depth (Table 6).

Fig. 5

Coverage plots of non-segmented viruses detected by mNGS and with ≥ 75% 1X genome coverage at a mean read depth ≥ 10.

Fig. 6

Coverage plots of segmented viruses detected by mNGS and with ≥ 75% 1X genome coverage at a mean read depth ≥ 10.

Conclusion

Assembly of consensus genomes from mNGS data enabled verification of viral hits from the metagenomics pipeline based on the mapping of reads to multiple loci along reference genomes. High-quality consensus genomes were also assembled and deposited to GenBank.

Discussion

In preparation for future viral outbreaks of unknown aetiology, mNGS workflows were designed and evaluated for swab and tissue samples positive for a range of known DNA and RNA viruses, collected from multiple anatomical sites and host species and subjected to diverse handling and storage timelines. QC analyses confirmed that these clinical specimens provided nucleic acid extracts spanning a wide range of concentrations, purity and integrity, yet the workflow remained sufficiently robust to detect viruses from poor-quality extracts. Applying the threshold of at least 2 unique matching reads for nucleotide-level searches (NT) and at least 10 unique matching reads for amino acid-level searches (NR), 89.5% of the 19 known viruses were detected, demonstrating that the selected extraction, library preparation and sequencing protocols are suitable for viral mNGS.

Detection of previously unrecognised or poorly characterised viruses is a key requirement of any mNGS-based pathogen surveillance system. The workflows established here generated data of sufficient quality to identify additional viruses in samples that had not previously tested positive for these agents, illustrating the capacity to detect previously unrecognised co-infections (“unknown knowns”). In addition, members of the Gammaherpesvirus genus were identified in Sample 14 using an amino acid search of the NR database, consistent with the presence of PHV7 – a virus lacking complete reference genome in the NT database. The logical next step is to apply these mNGS workflows to cases of suspected infectious disease of unknown aetiology, to assess performance in identifying truly novel or unanticipated viral infection, including unknown unknowns [42].

Assembly of viral consensus genomes was used to verify metagenomic hits by examining how reads distributed along reference genomes, enabling the identification of potential false positives. Identification of potential false positives in this study highlights the importance of complementing automated metagenomic classification with an independent verification step or other orthogonal methods.

High-quality consensus genomes were generated for several of the primary and secondary viral infections identified. An important advantage of mNGS over targeted virus identification methods, such as PCR and ELISA, is the breadth of genetic information obtained, which can be leveraged for downstream applications. Previous studies have shown how consensus genomes assembled from mNGS data can support further analyses by both the originating investigators and the wider research community once deposited in public open access repositories [12]. The workflows presented here can generate data of sufficient quality for the assembly of genomes with at least 65% coverage at 10X depth or greater, providing a foundation for applications such as phylogenetic analysis, molecular epidemiology, and exploration of genotype-phenotype relationships [43].

Previous studies have described mNGS workflows for viral pathogen identification across multiple sample matrices and body sites [26]. The present study extends this by demonstrating for the first time, an mNGS workflow tested on clinical samples from multiple host species, while explicitly documenting its robustness across nucleic acid extracts of varying concentrations, purity and integrity. Collectively, the tissue and swab mNGS workflows developed here have proven effective at detecting known viruses, previously unrecognised infections (unknown knowns), and partially characterised agents (known unknowns) from a spectrum of clinically relevant samples. These workflows are now ready for integration into passive and syndromic surveillance networks to evaluate their performance in real-world investigations of suspected infectious disease cases with unknown aetiology.

Declarations

Ethics approval and consent to participate

Ethical and regulatory oversight was secured through institutional review processes. The UCD Animal Research Ethics Committee granted exemptions for the use of samples collected as part of routine diagnostics or postmortem from DAFM (AREC-E-24-39-Gautier), Teagasc (AREC-E-25-25-Gautier) and UCD School of Veterinary Medicine (AREC-E-22-04-Jahns), and the Human Research Ethics Committee provided approval to work with human clinical samples provided by hospitals (306-LS-CSD-25-Mallon). The use of human samples complied with the Declaration of Helsinki. The participants provided their written informed consent for the collection of samples and clinical data for further research and publication.

Consent for publication

Not applicable

Data Availability

The datasets generated during the current study are available in the Sequence Read Archive repository, [https://www.ncbi.nlm.nih.gov/sra/PRJNA1371775](https:/www.ncbi.nlm.nih.gov/sra/PRJNA1371775)

Competing interests

The authors declare that they have no competing interests

Funding

Co-funded by the European Union EU4Health Programme 2021–2027 (grant agreement No. 101132970, EU4H-2022-DGA-MS-IBA3), supporting the "One Health – ALL Ireland for European Surveillance (OH-ALLIES)" project. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Health and Digital Executive Agency, the granting authority. Neither the European Union nor the granting authority can be held responsible for them.

Author Contribution

TR designed experiments, performed laboratory experiments for data acquisition, analysed and interpreted data and wrote the manuscript. EF performed laboratory experiments for data acquisition and reviewed manuscript drafts. AM designed experiments. JH performed laboratory experiments for data acquisition. MM conceptualised the project and obtained porcine gastrointestinal samples for data acquisition. MC conceptualised the project. LGC obtained avian samples for data acquisition. JFM obtained aborted foetus samples for data acquisition. HJ obtained elephant, seal and deer samples for data acquisition. CK, JB and ERF obtained human samples for data acquisition. PWGM conceptualised the project. VWG conceptualised the project, designed the project, interpreted the data and wrote and reviewed the drafts.

Acknowledgement

The authors wish to thank all study participants and their families for their participation and support in the conduct of the All Ireland Infectious Diseases Cohort Study.

Electronic Supplementary Material

Below is the link to the electronic supplementary material

Supplementary Material 1

References

Samarasekera U. New EU health programme comes into force. Lancet. 2021;397:1252–3. https://doi.org/10.1016/S0140-6736(21)00772-8.

Shanmugaraj B, Kothalam R, Tharik MS, Azeeze A. A brief overview on the threat of zoonotic viruses. Microbes Infect Dis. 2024;0:0–0. https://doi.org/10.21608/MID.2024.294905.1975.

Finch A, Vora NM, Hassan L, Walzer C, Plowright RK, Alders R, et al. The promise and compromise of the WHO Pandemic Agreement for spillover prevention and One Health. Lancet. 2025;0. https://doi.org/10.1016/S0140-6736(25)00632-4.

Berezowski J, De Balogh K, Dórea FC, Ruegg S, Broglia A, Zancanaro G, et al. Coordinated surveillance system under the One Health approach for cross-border pathogens that threaten the Union – options for sustainable surveillance strategies for priority pathogens. EFSA J. 2023;21:e07882. https://doi.org/10.2903/J.EFSA.2023.7882.

WHO to identify pathogens that could cause future outbreaks. and pandemics. https://www.who.int/news/item/21-11-2022-who-to-identify-pathogens-that-could-cause-future-outbreaks-and-pandemics. Accessed 5 Nov 2025.

Chatterjee P, Nair P, Chersich M, Terefe Y, Chauhan AS, Quesada F, et al. One Health, Disease X & the challenge of Unknown Unknowns. Indian J Med Res. 2021;153:264. https://doi.org/10.4103/IJMR.IJMR_601_21.

Chan JF-W, Kok K-H, Zhu Z, Chu H, To KK-W, Yuan S, et al. Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan. Emerg Microbes Infect. 2020;9:221. https://doi.org/10.1080/22221751.2020.1719902.

Zhou P, Yang X-L, Wang X-G, Hu B, Zhang L, Zhang W, et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature. 2020;579:270. https://doi.org/10.1038/S41586-020-2012-7.

Wu F, Zhao S, Yu B, Chen YM, Wang W, Song ZG, et al. A new coronavirus associated with human respiratory disease in China. Nat 2020. 2020;579:7798. https://doi.org/10.1038/s41586-020-2008-3.

10.

Huang C, Wang Y, Li X, Ren L, Zhao J, Hu Y, et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet. 2020;395:497. https://doi.org/10.1016/S0140-6736(20)30183-5.

11.

Enrique O, Montaguth T, Buddle S, Morfopoulou S, Breuer J. Clinical metagenomics for diagnosis and surveillance of viral pathogens. Nat Reviews Microbiol 2025. 2025;1–15. https://doi.org/10.1038/s41579-025-01223-5.

12.

Russell T, Formiconi E, Casey M, McElroy M, Mallon PWG, Gautier VW. Viral Metagenomic Next-Generation Sequencing for One Health Discovery and Surveillance of (Re)Emerging Viruses: A Deep Review. Int J Mol Sci 2025. 2025;26(9831):26:9831. https://doi.org/10.3390/IJMS26199831.

13.

Hoffmann B, Scheuch M, Höper D, Jungblut R, Holsteg M, Schirrmeier H, et al. Novel Orthobunyavirus in Cattle, Europe, 2011. Emerg Infect Dis. 2012;18:469. https://doi.org/10.3201/EID1803.111905.

14.

Corman VM, Landt O, Kaiser M, Molenkamp R, Meijer A, Chu DKW, et al. Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR. Eurosurveillance. 2020;25:2000045. https://doi.org/10.2807/1560-7917.ES.2020.25.3.2000045/CITE/REFWORKS.

15.

team E editorial. Erratum for Euro Surveill. 2020;25(3). Eurosurveillance. 2021;26:210204e. https://doi.org/10.2807/1560-7917.ES.2021.26.5.210204E

16.

Sharma O, Sultan AA, Ding H, Triggle CR. A Review of the Progress and Challenges of Developing a Vaccine for COVID-19. Front Immunol. 2020;11:585354. https://doi.org/10.3389/FIMMU.2020.585354/BIBTEX.

17.

Mahase E. Covid-19: Moderna applies for US and EU approval as vaccine trial reports 94.1% efficacy. BMJ. 2020;371. https://doi.org/10.1136/BMJ.M4709.

18.

Mahase E. Covid-19: Vaccine candidate may be more than 90% effective, interim results indicate. BMJ. 2020;371:m4347. https://doi.org/10.1136/BMJ.M4347.

19.

Bogner P, Capua I, Cox NJ, Lipman DJ. A global initiative on sharing avian flu data. Nat 2006. 2006;442:7106. https://doi.org/10.1038/442981a.

20.

Hadfield J, Megill C, Bell SM, Huddleston J, Potter B, Callender C, et al. NextStrain: Real-time tracking of pathogen evolution. Bioinformatics. 2018;34:4121–3. https://doi.org/10.1093/BIOINFORMATICS/BTY407.

21.

Sardi SI, Somasekar S, Naccache SN, Bandeira AC, Tauro LB, Campos GS, et al. Coinfections of zika and chikungunya viruses in bahia, Brazil, identified by metagenomic next-generation sequencing. J Clin Microbiol. 2016;54:2348–53. https://doi.org/10.1128/JCM.00877-16/ASSET/85E2DAE7-50FD-4543-A324-B4E04CAD76DF/ASSETS/GRAPHIC/ZJM9990951420002.JPEG.

22.

Pronyk PM, de Alwis R, Rockett R, Basile K, Boucher YF, Pang V, et al. Advancing pathogen genomics in resource-limited settings. Cell Genomics. 2023;3:100443. https://doi.org/10.1016/J.XGEN.2023.100443.

23.

Hong NTT, Anh NT, Mai NTH, Nghia HDT, Nhu LNT, Thanh TT, et al. Performance of Metagenomic Next-Generation Sequencing for the Diagnosis of Viral Meningoencephalitis in a Resource-Limited Setting. Open Forum Infect Dis. 2020;7. https://doi.org/10.1093/OFID/OFAA046.

24.

Yek C, Pacheco AR, Vanaerschot M, Bohl JA, Fahsbender E, Aranda-Díaz A, et al. Metagenomic pathogen sequencing in resource-scarce settings: Lessons learned and the road ahead. Front Epidemiol. 2022;2:926695. https://doi.org/10.3389/FEPID.2022.926695/BIBTEX.

25.

Greninger AL, Chen EC, Sittler T, Scheinerman A, Roubinian N, Yu G, et al. A Metagenomic Analysis of Pandemic Influenza A (2009 H1N1) Infection in Patients from North America. PLoS ONE. 2010;5:e13381. https://doi.org/10.1371/JOURNAL.PONE.0013381.

26.

Fourgeaud J, Regnault B, Ok V, Da Rocha N, Sitterlé É, Mekouar M, et al. Performance of clinical metagenomics in France: a prospective observational study. Lancet Microbe. 2024;5:e52–61. https://doi.org/10.1016/S2666-5247(23)00244-6.

27.

Ogunbayo AE, Sabiu S, Nyaga MM. Evaluation of extraction and enrichment methods for recovery of respiratory RNA viruses in a metagenomics approach. J Virol Methods. 2023;314:114677. https://doi.org/10.1016/J.JVIROMET.2023.114677.

28.

Mao W, Wang J, Li T, Wu J, Wang J, Wen S, et al. Pathogens 2025. Page 264. 2025;14:14:264. https://doi.org/10.3390/PATHOGENS14030264. Hybrid Capture-Based Sequencing Enables Highly Sensitive Zoonotic Virus Detection Within the One Health Framework.

29.

Mourik K, Sidorov I, Carbo EC, van der Meer D, Boot A, Kroes ACM, et al. Comparison of the performance of two targeted metagenomic virus capture probe-based methods using reference control materials and clinical samples. J Clin Microbiol. 2024;62. https://doi.org/10.1128/JCM.00345-24/SUPPL_FILE/JCM.00345-24-S0002.XLSX.

30.

Kalantar KL, Carvalho T, De Bourcy CFA, Dimitrov B, Dingle G, Egger R, et al. IDseq—An open source cloud-based pipeline and analysis service for metagenomic pathogen detection and monitoring. Gigascience. 2020;9:1–14. https://doi.org/10.1093/GIGASCIENCE/GIAA111.

31.

Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114. https://doi.org/10.1093/BIOINFORMATICS/BTU170.

32.

Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357. https://doi.org/10.1038/NMETH.1923.

33.

Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, et al. Twelve years of SAMtools and BCFtools. Gigascience. 2021;10:1–4. https://doi.org/10.1093/GIGASCIENCE/GIAB008.

34.

Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094. https://doi.org/10.1093/BIOINFORMATICS/BTY191.

35.

Buchfink B, Reuter K, Drost HG. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat Methods 2021. 2021;18:4. https://doi.org/10.1038/s41592-021-01101-x.

36.

Prjibelski A, Antipov D, Meleshko D, Lapidus A, Korobeynikov A. Using SPAdes De Novo Assembler. Curr Protoc Bioinf. 2020;70:e102. https://doi.org/10.1002/CPBI.102.

37.

Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421. https://doi.org/10.1186/1471-2105-10-421.

38.

Pedersen BS, Quinlan AR. Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics. 2018;34:867–8. https://doi.org/10.1093/BIOINFORMATICS/BTX699.

39.

ECDC. Sequencing of SARS-CoV-2: first update. 2021.

40.

Shean RC, Makhsous N, Stoddard GD, Lin MJ, Greninger AL. VAPiD: A lightweight cross-platform viral annotation pipeline and identification tool to facilitate virus genome submissions to NCBI GenBank. BMC Bioinformatics. 2019;20:48. https://doi.org/10.1186/S12859-019-2606-Y/TABLES/1.

41.

Liu B, Shao N, Wang J, Zhou SY, Su HX, Dong J, et al. An Optimized Metagenomic Approach for Virome Detection of Clinical Pharyngeal Samples With Respiratory Infection. Front Microbiol. 2020;11:1552. https://doi.org/10.3389/FMICB.2020.01552/FULL.

42.

Ashraf S, Jerome H, Bugembe DL, Ssemwanga D, Byaruhanga T, Kayiwa JT, et al. Uncovering the viral aetiology of undiagnosed acute febrile illness in Uganda using metagenomic sequencing. Nat Commun 2025. 2025;16:1. https://doi.org/10.1038/s41467-025-57696-8.

43.

Morfopoulou S, Buddle S, Torres Montaguth OE, Atkinson L, Guerra-Assunção JA, Moradi Marjaneh M, et al. Genomic investigations of unexplained acute hepatitis in children. Nat 2023. 2023;617:7961. https://doi.org/10.1038/s41586-023-06003-w.

Yes