-
[
Genetics,
2024]
The Alliance of Genome Resources (Alliance) is an extensible coalition of knowledgebases focused on the genetics and genomics of intensively-studied model organisms. The Alliance is organized as individual knowledge centers with strong connections to their research communities and a centralized software infrastructure, discussed here. Model organisms currently represented in the Alliance are budding yeast, C. elegans, Drosophila, zebrafish, frog, laboratory mouse, laboratory rat, and the Gene Ontology Consortium. The project is in a rapid development phase to harmonize knowledge, store it, analyze it, and present it to the community through a web portal, direct downloads, and Application Programming Interfaces (APIs). Here we focus on developments over the last two years. Specifically, we added and enhanced tools for browsing the genome (JBrowse), downloading sequences, mining complex data (AllianceMine), visualizing pathways, full-text searching of the literature (Textpresso), and sequence similarity searching (SequenceServer). We enhanced existing interactive data tables and added an interactive table of paralogs to complement our representation of orthology. To support individual model organism communities, we implemented species-specific "landing pages" and will add disease-specific portals soon; in addition, we support a common community forum implemented in Discourse software. We describe our progress towards a central persistent database to support curation, the data modeling that underpins harmonization, and progress towards a state-of-the art literature curation system with integrated Artificial Intelligence and Machine Learning (AI/ML).
-
[
Genetics,
2019]
Model organisms are essential experimental platforms for discovering gene functions, defining protein and genetic networks, uncovering functional consequences of human genome variation, and for modeling human disease. For decades, researchers who use model organisms have relied on Model Organism Databases (MODs) and the Gene Ontology Consortium (GOC) for expertly curated annotations, and for access to integrated genomic and biological information obtained from the scientific literature and public data archives. Through the development and enforcement of data and semantic standards, these genome resources provide rapid access to the collected knowledge of model organisms in human readable and computation-ready formats that would otherwise require countless hours for individual researchers to assemble on their own. Since their inception, the MODs for the predominant biomedical model organisms [<i>Mus sp</i> (laboratory mouse), <i>Saccharomyces cerevisiae</i>, <i>Drosophila melanogaster</i>, <i>Caenorhabditis elegans</i>, <i>Danio rerio</i>, and <i>Rattus norvegicus</i>] along with the GOC have operated as a network of independent, highly collaborative genome resources. In 2016, these six MODs and the GOC joined forces as the Alliance of Genome Resources (the Alliance). By implementing shared programmatic access methods and data-specific web pages with a unified "look and feel," the Alliance is tackling barriers that have limited the ability of researchers to easily compare common data types and annotations across model organisms. To adapt to the rapidly changing landscape for evaluating and funding core data resources, the Alliance is building a modern, extensible, and operationally efficient "knowledge commons" for model organisms using shared, modular infrastructure.
-
[
Genetics,
2022]
The Alliance of Genome Resources (the Alliance) is a combined effort of 7 knowledgebase projects: Saccharomyces Genome Database, WormBase, FlyBase, Mouse Genome Database, the Zebrafish Information Network, Rat Genome Database, and the Gene Ontology Resource. The Alliance seeks to provide several benefits: better service to the various communities served by these projects; a harmonized view of data for all biomedical researchers, bioinformaticians, clinicians, and students; and a more sustainable infrastructure. The Alliance has harmonized cross-organism data to provide useful comparative views of gene function, gene expression, and human disease relevance. The basis of the comparative views is shared calls of orthology relationships and the use of common ontologies. The key types of data are alleles and variants, gene function based on gene ontology annotations, phenotypes, association to human disease, gene expression, protein-protein and genetic interactions, and participation in pathways. The information is presented on uniform gene pages that allow facile summarization of information about each gene in each of the 7 organisms covered (budding yeast, roundworm Caenorhabditis elegans, fruit fly, house mouse, zebrafish, brown rat, and human). The harmonized knowledge is freely available on the alliancegenome.org portal, as downloadable files, and by APIs. We expect other existing and emerging knowledge bases to join in the effort to provide the union of useful data and features that each knowledge base currently provides.
-
Cho J, Davis P, Chan J, Kishore R, Brown S, Yook K, Lee R, Wright A, Quinton-Tulloch M, Luypaert M, Schindelman G, Harris T, Grigoriadis D, Muller HM, Howe K, Nuin P, Stein L, Cain S, Arnaboldi V, Van Auken K, Raciti D, Becerra A, Sternberg PW, Schedl T, Dyer S, Longden I, Grove CA, Wang Q, Diamantakis S, Chen WJ, Zarowiecki M
[
Genetics,
2024]
WormBase has been the major repository and knowledgebase of information about the genome and genetics of C. elegans and other nematodes of experimental interest for over two decades. We have three goals: to keep current with the fast-paced C. elegans research, to provide better integration with other resources, and to be sustainable. Here we discuss the current state of WormBase as well as progress and plans for moving core WormBase infrastructure to the Alliance of Genome Resources (the Alliance). As an Alliance member, WormBase will continue to interact with the C. elegans community, develop new features as needed, and curate key information from the literature and large-scale projects.
-
[
Nucleic Acids Res,
2020]
The Alliance of Genome Resources (Alliance) is a consortium of the major model organism databases and the Gene Ontology that is guided by the vision of facilitating exploration of related genes in human and well-studied model organisms by providing a highly integrated and comprehensive platform that enables researchers to leverage the extensive body of genetic and genomic studies in these organisms. Initiated in 2016, the Alliance is building a central portal (www.alliancegenome.org) for access to data for the primary model organisms along with gene ontology data and human data. All data types represented in the Alliance portal (e.g.genomic data and phenotype descriptions) have common data models and workflows for curation. All data are open and freely available via a variety of mechanisms. Long-term plans for the Alliance project include a focus on coverage of additional model organisms including those without dedicated curation communities, and the inclusion of new data types with a particular focus on providing data and tools for the non-model-organism researcher that support enhanced discovery about human health and disease. Here we review current progress and present immediate plans for this new bioinformatics resource.
-
Nash RS, Engel SR, Dolan ME, Sternberg PW, Van Slyke CE, Arnaboldi V, Genome Resources TAO, Urbano JM, Kishore R, Shimoyama M, Chan J
[
Database (Oxford),
2020]
Short paragraphs that describe gene function, referred to as gene summaries, are valued by users of biological knowledgebases for the ease with which they convey key aspects of gene function. Manual curation of gene summaries, while desirable, is difficult for knowledgebases to sustain. We developed an algorithm that uses curated, structured gene data at the Alliance of Genome Resources (Alliance; www.alliancegenome.org) to automatically generate gene summaries that simulate natural language. The gene data used for this purpose include curated associations (annotations) to ontology terms from the Gene Ontology, Disease Ontology, model organism knowledgebase (MOK)-specific anatomy ontologies and Alliance orthology data. The method uses sentence templates for each data category included in the gene summary in order to build a natural language sentence from the list of terms associated with each gene. To improve readability of the summaries when numerous gene annotations are present, we developed a new algorithm that traverses ontology graphs in order to group terms by their common ancestors. The algorithm optimizes the coverage of the initial set of terms and limits the length of the final summary, using measures of information content of each ontology term as a criterion for inclusion in the summary. The automated gene summaries are generated with each Alliance release, ensuring that they reflect current data at the Alliance. Our method effectively leverages category-specific curation efforts of the Alliance member databases to create modular, structured and standardized gene summaries for seven member species of the Alliance. These automatically generated gene summaries make cross-species gene function comparisons tenable and increase discoverability of potential models of human disease. In addition to being displayed on Alliance gene pages, these summaries are also included on several MOK gene pages.
-
[
MicroPubl Biol,
2024]
WormBase and the Alliance of Genome Resources provide several types of gene data including annotations to ontology terms and controlled vocabularies. These are used to automatically generate text summaries to give users a cogent view of gene function. However, automated summaries are not available for genes that lack curated annotations. To increase the genome coverage of the summaries in WormBase, we developed a new software module that generates additional gene summaries for <i>C. elegans</i> and new gene summaries for nine other nematode species: four <i>Caenorhabditis</i> species ( <i>C. brenneri, C. briggsae, C. japonica, C. remanei</i> ), <i>P. pacificus</i> , and four parasitic species ( <i>B. malayi, O. volvulus, S. ratti and T. muris</i> ).
-
[
Sci Rep,
2019]
Filarial nematode infections cause a substantial global disease burden. Genomic studies of filarial worms can improve our understanding of their biology and epidemiology. However, genomic information from field isolates is limited and available reference genomes are often discontinuous. Single molecule sequencing technologies can reduce the cost of genome sequencing and long reads produced from these devices can improve the contiguity and completeness of genome assemblies. In addition, these new technologies can make generation and analysis of large numbers of field isolates feasible. In this study, we assessed the performance of the Oxford Nanopore Technologies MinION for sequencing and assembling the genome of Brugia malayi, a human parasite widely used in filariasis research. Using data from a single MinION flowcell, a 90.3Mb nuclear genome was assembled into 202 contigs with an N50 of 2.4Mb. This assembly covered 96.9% of the well-defined B. malayi reference genome with 99.2% identity. The complete mitochondrial genome was obtained with individual reads and the nearly complete genome of the endosymbiotic bacteria Wolbachia was assembled alongside the nuclear genome. Long-read data from the MinION produced an assembly that approached the quality of a well-established reference genome using comparably fewer resources.
-
De Silva N, Yates AD, Ware D, Pedro H, McDowall MD, Maslen G, Urban M, Olson A, Gil L, Howe KL, Langridge N, Papatheodorou I, Moore B, Stein J, Rosello M, Perry E, Naithani S, Chakiachvili M, Paulini M, Staines DM, Cummins C, Haskell E, Flicek P, Kersey PJ, Jaiswal P, Russell M, Hammond-Kosack KE, Cuzick A, Janacek SH, Christensen M, Gupta P, Bolser DM, Barba M, Tello-Ruiz MK, Allen J, Davis P, Williams G, Maheswari U, Carbajo M, Hunt SE, George N, Maurel T, Contreras-Moreira B, Wei S, Muffato M, Alvarez-Jarreta J, Preece J, Fexova S, Gall A, Akanni W, Cambell L, Naamati G, Trevanion SJ, Sitnik V, Patricio M
[
Nucleic Acids Res,
2020]
Ensembl Genomes
(http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project
(http://www.ensembl.org). Together, the two resources provide a consistent set of interfaces to genomic data across the tree of life, including reference genome sequence, gene models, transcriptional data, genetic variation and comparative analysis. Data may be accessed via our website, online tools platform and programmatic interfaces, with updates made four times per year (in synchrony with Ensembl). Here, we provide an overview of Ensembl Genomes, with a focus on recent developments. These include the continued growth, more robust and reproducible sets of orthologues and paralogues, and enriched views of gene expression and gene function in plants. Finally, we report on our continued deeper integration with the Ensembl project, which forms a key part of our future strategy for dealing with the increasing quantity of available genome-scale data across the tree of life.
-
Davis P, Wright AJ, Chen WJ, Schindelman G, Gao S, Arnaboldi V, Williams G, Russell M, Chan J, Grove CA, Nakamura C, Cain S, Kishore R, Schedl T, Nuin P, Cho J, Raciti D, Stein L, Rodgers FH, Muller HM, Paulini M, Howe KL, Wang Q, Yook K, Lee RYN, Sternberg PW, Harris TW, Auken KV
[
Nucleic Acids Res,
2019]
WormBase (https://wormbase.org/) is a mature Model Organism Information Resource supporting researchers using the nematode Caenorhabditis elegans as a model system for studies across a broad range of basic biological processes. Toward this mission, WormBase efforts are arranged in three primary facets: curation, user interface and architecture. In this update, we describe progress in each of these three areas. In particular, we discuss the status of literature curation and recently added data, detail new features of the web interface and options for users wishing to conduct data mining workflows, and discuss our efforts to build a robust and scalable architecture by leveraging commercial cloud offerings. We conclude with a description of WormBase's role as a founding member of the nascent Alliance of Genome Resources.