-
[
Genetics,
2024]
The Alliance of Genome Resources (Alliance) is an extensible coalition of knowledgebases focused on the genetics and genomics of intensively-studied model organisms. The Alliance is organized as individual knowledge centers with strong connections to their research communities and a centralized software infrastructure, discussed here. Model organisms currently represented in the Alliance are budding yeast, C. elegans, Drosophila, zebrafish, frog, laboratory mouse, laboratory rat, and the Gene Ontology Consortium. The project is in a rapid development phase to harmonize knowledge, store it, analyze it, and present it to the community through a web portal, direct downloads, and Application Programming Interfaces (APIs). Here we focus on developments over the last two years. Specifically, we added and enhanced tools for browsing the genome (JBrowse), downloading sequences, mining complex data (AllianceMine), visualizing pathways, full-text searching of the literature (Textpresso), and sequence similarity searching (SequenceServer). We enhanced existing interactive data tables and added an interactive table of paralogs to complement our representation of orthology. To support individual model organism communities, we implemented species-specific "landing pages" and will add disease-specific portals soon; in addition, we support a common community forum implemented in Discourse software. We describe our progress towards a central persistent database to support curation, the data modeling that underpins harmonization, and progress towards a state-of-the art literature curation system with integrated Artificial Intelligence and Machine Learning (AI/ML).
-
[
Genetics,
2019]
Model organisms are essential experimental platforms for discovering gene functions, defining protein and genetic networks, uncovering functional consequences of human genome variation, and for modeling human disease. For decades, researchers who use model organisms have relied on Model Organism Databases (MODs) and the Gene Ontology Consortium (GOC) for expertly curated annotations, and for access to integrated genomic and biological information obtained from the scientific literature and public data archives. Through the development and enforcement of data and semantic standards, these genome resources provide rapid access to the collected knowledge of model organisms in human readable and computation-ready formats that would otherwise require countless hours for individual researchers to assemble on their own. Since their inception, the MODs for the predominant biomedical model organisms [<i>Mus sp</i> (laboratory mouse), <i>Saccharomyces cerevisiae</i>, <i>Drosophila melanogaster</i>, <i>Caenorhabditis elegans</i>, <i>Danio rerio</i>, and <i>Rattus norvegicus</i>] along with the GOC have operated as a network of independent, highly collaborative genome resources. In 2016, these six MODs and the GOC joined forces as the Alliance of Genome Resources (the Alliance). By implementing shared programmatic access methods and data-specific web pages with a unified "look and feel," the Alliance is tackling barriers that have limited the ability of researchers to easily compare common data types and annotations across model organisms. To adapt to the rapidly changing landscape for evaluating and funding core data resources, the Alliance is building a modern, extensible, and operationally efficient "knowledge commons" for model organisms using shared, modular infrastructure.
-
[
Genetics,
2022]
The Alliance of Genome Resources (the Alliance) is a combined effort of 7 knowledgebase projects: Saccharomyces Genome Database, WormBase, FlyBase, Mouse Genome Database, the Zebrafish Information Network, Rat Genome Database, and the Gene Ontology Resource. The Alliance seeks to provide several benefits: better service to the various communities served by these projects; a harmonized view of data for all biomedical researchers, bioinformaticians, clinicians, and students; and a more sustainable infrastructure. The Alliance has harmonized cross-organism data to provide useful comparative views of gene function, gene expression, and human disease relevance. The basis of the comparative views is shared calls of orthology relationships and the use of common ontologies. The key types of data are alleles and variants, gene function based on gene ontology annotations, phenotypes, association to human disease, gene expression, protein-protein and genetic interactions, and participation in pathways. The information is presented on uniform gene pages that allow facile summarization of information about each gene in each of the 7 organisms covered (budding yeast, roundworm Caenorhabditis elegans, fruit fly, house mouse, zebrafish, brown rat, and human). The harmonized knowledge is freely available on the alliancegenome.org portal, as downloadable files, and by APIs. We expect other existing and emerging knowledge bases to join in the effort to provide the union of useful data and features that each knowledge base currently provides.
-
Cho J, Davis P, Chan J, Kishore R, Brown S, Yook K, Lee R, Wright A, Quinton-Tulloch M, Luypaert M, Schindelman G, Harris T, Grigoriadis D, Muller HM, Howe K, Nuin P, Stein L, Cain S, Arnaboldi V, Van Auken K, Raciti D, Becerra A, Sternberg PW, Schedl T, Dyer S, Longden I, Grove CA, Wang Q, Diamantakis S, Chen WJ, Zarowiecki M
[
Genetics,
2024]
WormBase has been the major repository and knowledgebase of information about the genome and genetics of C. elegans and other nematodes of experimental interest for over two decades. We have three goals: to keep current with the fast-paced C. elegans research, to provide better integration with other resources, and to be sustainable. Here we discuss the current state of WormBase as well as progress and plans for moving core WormBase infrastructure to the Alliance of Genome Resources (the Alliance). As an Alliance member, WormBase will continue to interact with the C. elegans community, develop new features as needed, and curate key information from the literature and large-scale projects.
-
[
Worm Breeder's Gazette,
1994]
The C. elegans genome sequencing project: A progress report. The C. elegans Genome Consortium, Genome Sequencing Center, Washington University School of Medicine, St. Louis, Missouri, USA and Sanger Centre, Hinxton Hall, Cambridge, UK.
-
[
Worm Breeder's Gazette,
1994]
The C. elegans genome sequencing project: A progress report. The C. elegans Genome Consortium, Genome Sequencing Center, Washington University School of Medicine, St. Louis, Missouri, USA and Sanger Centre, Hinxton Hall, Cambridge, UK.
-
[
Nucleic Acids Res,
2020]
The Alliance of Genome Resources (Alliance) is a consortium of the major model organism databases and the Gene Ontology that is guided by the vision of facilitating exploration of related genes in human and well-studied model organisms by providing a highly integrated and comprehensive platform that enables researchers to leverage the extensive body of genetic and genomic studies in these organisms. Initiated in 2016, the Alliance is building a central portal (www.alliancegenome.org) for access to data for the primary model organisms along with gene ontology data and human data. All data types represented in the Alliance portal (e.g.genomic data and phenotype descriptions) have common data models and workflows for curation. All data are open and freely available via a variety of mechanisms. Long-term plans for the Alliance project include a focus on coverage of additional model organisms including those without dedicated curation communities, and the inclusion of new data types with a particular focus on providing data and tools for the non-model-organism researcher that support enhanced discovery about human health and disease. Here we review current progress and present immediate plans for this new bioinformatics resource.
-
Williams, Gary, Howe, Kevin, Russell, Matthew, Davis, Paul, Schedl, Tim, Paulini, Michael
[
International Worm Meeting,
2019]
WormBase
(http://www.wormbase.org) is a central data repository for nematode biologists and scientists enabling experimental research in C. elegans. WormBase is also a founding member of the Alliance of Genome Resources (https://www.alliancegenome.org), an effort to align data representation and curation workflows between the major model organism databases (MODs). We will describe how WormBase acquires Strain and Variant data from a variety of sources, keeps up to date and feeds improvements back to third parties. One particular change has been to preserve all strain data in perpetuity, so as to not lose information about historical Strains. A project that we will be undertaking in the near future is to formally accession Strains to allow for inter-resource stability and reliability. This is particularly important for the representation of cross-species strain and population data in the Alliance of Genome Resources. Two of the primary sources of data are the CGC (Caenorhabditis Genetics Center) and CeNDR (Caenorhabditis elegans Natural Diversity Resource). WormBase uses a variety of techniques to distribute the data from these centers and advertise the availability of Strains from these resources. We will describe the status of C. elegans variation data, summarising the different types and the completeness of information from these data types. Common representation of variation data between the MODs is an important project for the Alliance of Genome Resources, and we will provide an update of progress and plans in that area. Finally, nomenclature is a key area within WormBase and the C. elegans community as a whole. For decades the Worm Community has been a leading the way, having formal nomenclature for genes, variations and strains for all labs actively working long term in the field. We will summarise some of the aspects of this activity and how the various processes work.
-
[
Annu Rev Genomics Hum Genet,
2015]
The modENCODE (Model Organism Encyclopedia of DNA Elements) Consortium aimed to map functional elements-including transcripts, chromatin marks, regulatory factor binding sites, and origins of DNA replication-in the model organisms Drosophila melanogaster and Caenorhabditis elegans. During its five-year span, the consortium conducted more than 2,000 genome-wide assays in developmentally staged animals, dissected tissues, and homogeneous cell lines. Analysis of these data sets provided foundational insights into genome, epigenome, and transcriptome structure and the evolutionary turnover of regulatory pathways. These studies facilitated a comparative analysis with similar data types produced by the ENCODE Consortium for human cells. Genome organization differs drastically in these distant species, and yet quantitative relationships among chromatin state, transcription, and cotranscriptional RNA processing are deeply conserved. Of the many biological discoveries of the modENCODE Consortium, we highlight insights that emerged from integrative studies. We focus on operational and scientific lessons that may aid future projects of similar scale or aims in other, emerging model systems.
-
Chitrakar, Rojin, Stevens, Lewis, Baugh, L. Ryan, Moya, Nicolas D., Walhout, Marian, Tanny, Robyn E., Dekker, Job, Na, Huimin, Andersen, Erik C.
[
International Worm Meeting,
2021]
Decades of research have led to the development of comprehensive genome resources that have been essential to study the Caenorhabditis elegans species. In parallel, the emergence of Caenorhabditis briggsae as a model system has been useful to make interspecies comparisons. Despite the importance of C. briggsae as a model, its genome resources have not been developed to the same extent as C. elegans. The current genome of C. briggsae reference strain AF16 contains thousands of unresolved gaps and numerous mis-assemblies. Because of these issues, C. briggsae gene models remain incomplete and have numerous structural errors in protein-coding genes. We sought to exploit the latest sequencing technologies and computational tools to provide the highest quality C. briggsae genome resources to date. First, we generated high-quality genome assemblies for two strains of C. briggsae: QX1410 (a "tropical" strain isolated in Saint Lucia that is closely related to AF16) and VX34 (a divergent strain isolated in China). These genome assemblies incorporate high coverage Oxford Nanopore PromethION long reads and chromosome conformation capture (Hi-C) data. Second, we genotyped 99 recombinant inbred lines generated from reciprocal crosses between QX1410 and VX34. Using these data, we produced a high-quality recombination map that validated the placement of scaffolds after genome assembly. Third, we sequenced the transcriptomes of each strain to high coverage using Pacific Biosciences SMRT and Illumina platforms. We developed a computational pipeline that leverages long and short RNA reads to generate a genome annotation for each strain. These new genome annotations have improved accuracy and completeness relative to the AF16 genome. Fourth, our research group currently maintains over 1,600 C. briggsae wild strains, comprising the largest collection worldwide. We sequenced the genomes of this entire collection to high coverage using the Illumina platform. We mapped the sequences of all wild strains to the QX1410 genome to call single nucleotide variants across the entire population. These high-quality genome resources will facilitate new avenues of research, including quantitative and population genetic studies of C. briggsae, and enable informative comparisons with C. elegans.