[
International Worm Meeting,
2017]
Almost twenty years after the completion of the C. elegans genome sequence, gene structure annotation is still an ongoing process with new evidence for gene variant still being regularly uncovered by more in-depth transcriptome studies. Alternative splice forms, allow a single gene to encode several protein variants, called isoforms, with altered stability, localization, specificity or activity. Here we generated a compendium of ~1,700 C. elegans RNAseq datasets to expand the dynamic range of detection of RNA isoforms and obtain robust measurement of the relative abundance of each splicing event. We detected ~700,000 exon-exon junctions and over 5,000,000 transplicing sites, but most of these sites are only supported by a very low number of reads : 98% of splicing reads come from ~60,000 high confidence-splice junctions and 88% of transpliced reads come from ~36,000 robust transplice sites. Our analysis indicates that the mechanism of transplicing generates more aspecific species than cis-splicing, but it also highlights that putative erroneous splicing only represent a minor proportion of detectable spliced RNA species. We found that rarely used splice sites within coding genes are significantly less conserved in other nematode genomes than splice sites with a higher usage frequency. We generated updated gene models including previously unreported transcription start and splice sites and including quantitative exon usage information for the entire C. elegans genome to allow users to visualize at a glance the relative expression of each isoform of their gene of interest.
[
International Worm Meeting,
2019]
A recent meta-analysis of alternative exon usage in Caenorhabditis elegans based on publicly available RNA-seq dataset (Tourasse et al., Genome Research, 2017) refined our comprehension of C. elegans transcriptome, especially regarding the splicing quantitative aspects of alternative splicing in messenger RNAs. However, Next-Generation Sequencing technologies (NGS) like Illumina technology are proving to be limited to fully characterize one's transcriptome. PCR-based sequencing methods are known to introduce amplification bias affecting the overall distribution of mRNAs detected in one experiment and short-reads are not suited to accurately predict the frequency of isoformes derived from multiple alternative splicing events. In this study, we are exploiting the new possibilities offered by Oxford Nanopore Technology (ONT) to overcome those limitations. Nanopore-based sequencing allow to directly sequence nucleic acids without any prior amplification step and generates long-reads covering up to the full-length of the molecule. Hence, we are aiming to further characterize C. elegans transcriptome by providing a more accurate measure of isoforms ratios, a better comprehension of exons associations during alternative splicing and by characterizing differentially trans-spliced mRNAs. To do so, we analyzed two different populations of mRNAs: a library of poly(A) mRNAs representing the whole-animal transcriptome and a library of SL1-enriched mRNAs. Those libraries were sequenced using an ONT MinION device and analyzed using a combination of tools recommended for long-reads analysis and in-house python scripts. We assessed the efficiency of three different sequencing kits commercialized by ONT that are recommended for transcriptomics. Our results suggest that direct-cDNA sequencing is most suited for transcriptome analysis in C. elegans, in regard to the quantity of data generated while preserving the quality of the dataset. The two libraries were compared together at the level of both genes and isoformes. We are reporting a set of non-SL1 genes that are found highly expressed in poly(A) libraries but not detected in SL1-enriched libraries. Additionally, we are also showing that alternatives promoters can lead to populations of isoformes exhibiting different trans-splicing status.