The first step in analyzing whole genome sequence alignments is to detect variant sequences and associate them with genome annotations. Most genome sequence facilities have access to developers who run command line driven scripts to make this association. Our mission is to simplify this and make the annotation of SNP and Indel calls accessible to the average investigator. To this end we have launched Whole Genomes, a website where investigators can upload SNP-called files for custom annotations
(http://seqreport.omrf.org/genome: open to all on May 1, 2011). Administratively, the site design is flexible, allowing one to easily create annotation tables based on tab delimitated flat files like the GFF files distributed by Wormbase. Further, the site uses simple forms to establish persistent templates that describe the format for uploaded SNP files. All files are imported into a MySQL database and all comparisons are done via MySQL queries that offer a much more dynamic interaction with the data compared to a simple script driven approach. Users do not need to know anything about annotation files, templates, or MySQL to use the site.
Whole Genomes allows users to map SNPs to any combination of features in the annotation database, such as coding sequence, potential splice sites, untranslated regions, or microRNAs. For examples, a coding sequence query would report the specific amino acid changes for SNPs that fall within translatable sequence along with a metric of evolutionary conservation for that particular change. The site also allows the user to compare SNPs in two strains to identify both unique and common variations. In addition, if a user has minimal or no mapping information Whole Genomes can compare two allelic strains to find the gene that is affected in both. Each annotated SNP is associated with a unique "Download" link that lets users download the DNA sequence associated with the annotated feature. Finally, for any given SNP, the site will design PCR primers to isolate DNA for SNP verification.