TY - CHAP
T1 - NGS and virus diagnostic: does sequence analysis strategies really matter? Results of an international proficiency testing on siRNA
AU - Massart, Sébastien
AU - Adams, Ian
AU - De Jonghe, Kris
AU - COST-Divas working group 1: Sébastien Massart, Ian Adams, Kris De Jonghe, Annalisa Giampetruzzi, Igor Koloniuk, Petr Kominek, Jan Kreuze, Denis Kutnjak, Leonidas Lotos, Hans J. Maree, Thibaut Olivier, Mikhail Pooggin, Ana B. Ruiz-García, Dana Safarova, P.
N1 - the contributors of this communication are listed by alphabetical order, excepting S.M., Y.B. and T.C.. They are members of the Core Group of the COST Action and their addresses are listed in the COST website: http://www.cost.eu/COST_Actions/fa/Actions/FA1407?management
PY - 2017/1
Y1 - 2017/1
N2 - The recent developments of high-throughput sequencing (also called Next Generation Sequencing - NGS) technologies and bioinformatics have drastically changed the research on viral pathogens and is now raising a growing interest for virus diagnostics. However, any diagnostic technique has to be included in standardized protocols. Currently, a huge diversity of bioinformatics protocols for virus discovery has been reported in the scientific literature but, to date, without addressing their reliability for diagnostic purpose. The objective of this work was therefore to compare the performance of existing bioinformatics pipelines and of the result interpretation through a double-blinded large scale proficiency testing based on a set of ten fastq files and involving 21 laboratories from 16 countries. The fastq files contained 50,000 (3), 250,000 (4) and 2.5 M (3) sequences of 21-24 nt coming from 3 samples. The false positive rate was only 0.5% and mainly related to the identification of integrated sequences or misinterpretation of the results. The overall sensitivity of detection was 57 % and ranged between 35 and 100% between laboratories with a marked effect of rarefaction for some laboratories. A principal component analysis and correlation studies underlined the most important parameters for appropriate diagnostic. The repeatability of detection corresponded to 73%. This work also underlined (i) the complexity of discovering new viruses by NGS, (ii) the difficulty to detect viral pathogens with low number of siRNA reads, (iii) the inconsistencies of databases and its impact on results. Overall, this work brings key insights into the reliability of bioinformatics pipelines and underlines some key parameters for achieving a reliable detection of viruses in a diagnostic setting using siRNA sequencing.Acknowledgement: This article is based upon work from COST Action FA1407 – www.cost-divas.eu, supported by COST (European Cooperation in Science and Technology)
AB - The recent developments of high-throughput sequencing (also called Next Generation Sequencing - NGS) technologies and bioinformatics have drastically changed the research on viral pathogens and is now raising a growing interest for virus diagnostics. However, any diagnostic technique has to be included in standardized protocols. Currently, a huge diversity of bioinformatics protocols for virus discovery has been reported in the scientific literature but, to date, without addressing their reliability for diagnostic purpose. The objective of this work was therefore to compare the performance of existing bioinformatics pipelines and of the result interpretation through a double-blinded large scale proficiency testing based on a set of ten fastq files and involving 21 laboratories from 16 countries. The fastq files contained 50,000 (3), 250,000 (4) and 2.5 M (3) sequences of 21-24 nt coming from 3 samples. The false positive rate was only 0.5% and mainly related to the identification of integrated sequences or misinterpretation of the results. The overall sensitivity of detection was 57 % and ranged between 35 and 100% between laboratories with a marked effect of rarefaction for some laboratories. A principal component analysis and correlation studies underlined the most important parameters for appropriate diagnostic. The repeatability of detection corresponded to 73%. This work also underlined (i) the complexity of discovering new viruses by NGS, (ii) the difficulty to detect viral pathogens with low number of siRNA reads, (iii) the inconsistencies of databases and its impact on results. Overall, this work brings key insights into the reliability of bioinformatics pipelines and underlines some key parameters for achieving a reliable detection of viruses in a diagnostic setting using siRNA sequencing.Acknowledgement: This article is based upon work from COST Action FA1407 – www.cost-divas.eu, supported by COST (European Cooperation in Science and Technology)
M3 - C3: Congres abstract
BT - Abstract book - Rencontre de virologie végétale 2017
T2 - Rencontre de virologie végétale" Aussois (France)
Y2 - 15 January 2017 through 19 January 2017
ER -