Detection of single nucleotide polymorphisms in virus genomes assembled from high-throughput sequencing data: large-scale performance testing of sequence analysis strategies

Johan Rollin, Rachelle Bester, Yves Brosteaux, Kadriye Caglayan, Kris De Jonghe, A. Eichmeier, Yoika Foucart, Annelies Haegeman, Igor Koloniuk, Petr Kominek, Hano Maree, Serkan Onder, Susana Posada Cespedes, Vahid Roumi, Dana Safarova, Olivier Schumpp, Cigdem Ulubas Serce, Merike Somera, Lucie Tamisier, Eeva VainioRené van der Vlugt, Sebastien Massart

Onderzoeksoutput: Bijdrage aan tijdschriftA1: Web of Science-artikelpeer review

Uittreksel

Recent developments in high-throughput sequencing (HTS) technologies and bioinformatics have drastically changed research on viral pathogens, especially for virus discovery and monitoring. Indeed, proper monitoring of the viral population requires information on the different isolates circulating in the studied area. For this purpose, HTS technologies have greatly facilitated the sequencing of new genome sequences of the detected viruses and their comparison. However, the bioinformatics analyses allowing the reconstruction of genome sequences and the detection of Single Nucleotide Polymorphisms (SNPs) can potentially create bias, although it has not been widely addressed so far.
Therefore, more knowledge is required on the limitation and possibility of predicting SNPs based on HTS-generated sequence samples. To address this issue, we compared the ability of 14 plant virology laboratories, each employing a different bioinformatics pipeline, to detect 21 variants of pepino mosaic virus (PepMV) in three samples through large-scale Performance Testing (PT) using three artificially designed samples. The bioinformatics analyses were divided into three key steps: reads pre-processing (quality trimming, merging), virus identification (assembly, alignment, mapping) and variant calling. Each step was evaluated independently through an original, step-by-step PT design with iteration between participants.
Overall, this work underlines key parameters in SNPs detection and proposes recommendations for reliable variant calling for plant viruses. The identification of the closest reference, mapping parameters and manual validation of the prediction were recognized as the most impactful analysis steps for the success or failure of the predictions. Strategies to improve SNPs prediction are also discussed.
Oorspronkelijke taalEngels
Artikel nummer15816
TijdschriftPeerJ
Volume11
Pagina's (van-tot)1-25
Aantal pagina’s25
ISSN2167-8359
DOI's
PublicatiestatusGepubliceerd - 16-aug.-2023

Vingerafdruk

Bekijk de onderzoeksthema's van 'Detection of single nucleotide polymorphisms in virus genomes assembled from high-throughput sequencing data: large-scale performance testing of sequence analysis strategies'. Samen vormen ze een unieke vingerafdruk.

Dit citeren