[
International C. elegans Meeting,
2001]
We have initiated experiments designed to understand the regulatory regions of C. elegans genes using known muscle genes of C. elegans as a model. Two approaches are being used to pursue this goal. The first approach is to computationally compare the muscle genes from C. elegans to the orthologous muscle sequences from C. briggsae . This comparison is useful because the patterns of gene regulation and regulatory elements are often conserved across species. The C. briggsae orthologue are found by making a probe from the C. elegans muscle gene and probing the C. briggsae fosmid filter available from Incyte. The most promising positive clones are determined by fingerprinting and these are sequenced by the Genome Sequencing Center. To compare the orthologous sequences from C. elegans and C. briggsae , we will use pairwise alignment methods like BlastZ(4) or Bayes aligner(5) to identify regions of interest. Local multiple alignment programs can then be used to search for common regulatory elements in these regions. Since the local multiple alignment methods work best with sequences which are only 1000-2000 nucleotides long, phylogenetic footprinting will be useful in identifying shorter regions from much longer regions(10,000-20,000 nucleotides). The second approach is to use a combination of computational methods to identify potential muscle specific regulatory elements from the known set of C. elegans muscle genes. Local multiple sequence alignment methods like Consensus(1), Ann-Spec(2) and Co-Bind(3) are being used to identify these potential regulatory elements. Using the above method we have already identified several potential regulatory elements which show high degree of specificity for the muscle genes. The regulatory elements that these computational methods predict can then be used to screen the C. elegans genome for new genes that are expressed in muscle cells. To test our results we have developed a method to examine the expression patterns of genes in C. elegans using gfp promoter fusions. We are including in our promoter fusions 6,000 nucleotides upstream of the start methionine, all of the first exon and all the first intron. In our initial experiments, known muscle genes tested in this manner show muscle-like expression. We can now use this method to test the requirement for regulatory regions predicted by the computational work to determine if they convey muscle specific expression. In addition, we can use this method to test genes we predict to be, but not previously known to be, expressed in muscle. Furthermore, we are developing these methods to allow for the rapid production of these promoter fusions so that ultimately, a genome wide program to categorize all C. elegans genes by gfp and automated lineaging can be done. 1. Hertz, G.Z., and Stormo, G.D. (1999) Bioinformatics, vol. 15, pp. 563-577 2. Workman, C.T., and Stormo, G.D. (2000) Pacific Symposium on Biocomputing, vol 5, pp. 464-475 3. GuhaThakurta, D., and Stormo, G.D. (2001) Bioinformatics, in press. 4. Schwartz, S. et.al. (2000) Genome Research, vol. 10, pp. 577-586. 5. Zhu, J., Liu, J.S., and Lawrence, C.E. (1998) vol. 14, pp. 25-39.