Skip to main navigation Skip to search Skip to main content

Analysis of allelic diversity in MAX-genes of Lolium perennefor validationof next generation transcriptome sequencing

    Activity: OtherTypes of other (prices, external and other activities) - (Co)promotor masterthesis

    Description

    This thesis is situated in the study of genetic diversity in candidate genes of the MAX pathway that control plant architecture in Lolium perenne. Prior to the research, a Next Generation Sequencing (NGS) dataset of 14 different genotypes of L. perenne was set up. Within this dataset, Single Nucleotide Polymorphisms (SNPs) and InDels are selected for use as a molecular marker in association genetics.
    Orthologs of 8 candidate genes in the MAX pathway (MAX1, MAX2, MAX3, MAX4, TCP1, TCP2, TCP3 and D14) were selected in the L. perenne transcriptome through phylogenetic analysis. These candidate genes formed a training dataset in order to modify the bioinformatics procedures and parameters, and provide a first indication of the genetic diversity in L. perenne. Further, fragments of these candidate genes were PCR-amplified, cloned and Sanger sequenced in the 14 NGS genotypes, and the allelic variation was analyzed for SNPs and InDels. These data were used for the validation of the in silico reconstructed transcript sequences from the NGS dataset and the evaluation of the parameters for automatic SNP identification.
    Analysis of the allelic diversity in the candidate genes shows a high degree of nucleotide diversity. For MAX1 and MAX4 a number of unique alleles was found which diverge from the major alleles and therefore contribute to the SNP density per gene. The SNP densities range from 2,84/100bp (MAX4) to 8,39/100bp (MAX1) between all the alleles in the entire gene sequence (intron + exon); or from 1.10/100 bp (MAX4) to 4,0/100 bp (MAX3), measured between major alleles in exon sequences. InDels are found in introns or untranslated region only and, with lower frequency compared to SNPs ranging from (0,06/100bp (MAX1) to 0,39/100bp (MAX4) for major alleles.
    The use of the standard (relatively stringent) read mapping and SNP calling parameters leads to splitting up of alleles during CAP3-clustering, and relative high percentages of false negative SNPs. This shows that the high degree of allelic diversity in L. perenne genes requires a reduction of the stringency. In this way, the different alleles can align during CAP3-clustering and reads can be mapped correctly in order to identify SNPs. At the same time, a reduction of the stringency makes it difficult to distinguish paralogs from alleles, which can lead to false positive SNPs. The high density of SNPs can also interfere with the design or performance of molecular (genotyping) assays because primer/probe binding sites should be free of neighbouring SNPs in SNP-assays.
    This thesis illustrates the relation between allelic diversity and the effect of parameters for bioinformatics procedures (the novo assembly, clustering, read mapping, SNP identification) and lays the foundations of automatic SNP identification in the complete L. perenne transcriptome.

    Period20102011
    Held atHogeschool Gent, departement Toegepaste Ingenieurs Wetenschappen, Belgium