[
International Worm Meeting,
2003]
Regulatory elements (such as transcription factor binding sites) are crucial to transcriptional control. As whole genome sequences and large amount of microarray data become available, it is now possible to computationally identify regulatory elements. Though not a replacement for experimental methods, computational methods can guide and speed up the experimental methods by narrowing down the list of potential site for further exploration. In one experiment, we searched for motifs (cis-regulatory elements) in the upstream region of 72 genes whose expression changed significantly in both the egg and larvae stages after heat shock (M. Jiang, B. Romagnolo, unpublished data). As regulatory elements tend to be more conserved across species than background sequence due to their important function, we employed a comparative genomics motif-finding program, CompareProspector, that searches for motifs in regions that are conserved between C. elegans and C. briggsae. Three motifs were identified from the upstream regions of the heat shock-regulated genes. The first motif, with the consensus GAAYKTTCTAGAA, matches very well with the known heat shock element (HSE), which has the consensus GAAnnTTCnnGAA. The second motif (GGGTCTC) has previously been shown to be a true regulatory element that when mutated affects gene expression level (GuhaThakurta et al., 2002). The third motif (GAGACGCAG) is a new motif with unknown function. A variety of criteria, such as motif enrichment compared to background sequence and correlation of motif score with gene expression value were used to test whether this motif is relevant to the heat shock response. We intend to use this method to find regulatory elements in the whole C. elegans genome.
[
West Coast Worm Meeting,
2004]
Much attention has recently been placed on the problem of identification of gene regulatory sequences. Two main computational approaches exist: 1) identification of sequences that share similarity to known regulatory elements, and 2) de novo identification of common motifs in a set of co-regulated genes. Orthology Biased Gibbs Sampling (OrBS) applies a modification of the Gibbs Sampler put forth by Lawrence, Altschult et al . 1 in a attempt to solve this problem. Additional features include analysis of negative strand, multiple (or no) occurrences of a motif in any given sequence and identification of multiple unique motifs, all of which have appeared in previous incarnations of the Gibbs Sampler. The unique feature of OrBS is the use of comparative genomics as an informative prior and in the calculation of motif probability. OrBS will be applied to the identification of regulatory elements C. elegans . Clustering of spatially co-regulated genes as defined by the C. elegans Gene Expression Project (see Johnsen et al . poster). This data is more amenable to regulatory element detection than the more common microarray data because the sequence in which a cis -acting element may reside is strictly defined. Interspecies sequence conservation will be identified through alignment of the C. elegans gene promoter region to the promoter region of the C. briggsae ortholog using LAGAN 2 . Predicted regulatory elements will be verified in vivo using site directed mutagenesis and promoter::GFP fusions. References: 1 Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC. (1993). Science 262: 208-214. 2 Brudno M, Do C, Cooper G, Kim MF, Davydov E, Green ED, Sidow A, Batzoglou S. (2003). Genome Research 13: 721-731.
[
West Coast Worm Meeting,
2004]
Although many advances have been made by studying individual genes, the global picture of tissue-specific gene expression in an organism remains dim. The relatively simple body plan, yet representative of many specific tissues, and complete genome sequence of the nematode C. elegans allows for a systems biology approach to the question of which genes specify a tissue. A global profile of the tissue-specific gene expression of this organism would expand our knowledge of tissue development and maintenance, regulation of gene expression, higher order chromosome structure, and the housekeeping genes that characterize a cell. In order to identify the genes expressed in the intestine of the worm we have taken the mRNA-tagging approach described by Roy et al (2002) to identify genes expressed in muscle cells. Animals expressing FLAG::PAB-1 from the intestine-specific promoter
ges-1 were used to immunoprecipitate FLAG::PAB-1/mRNA complexes from the intestine and the number of enriched genes relative to whole worm lysate was determined by microarrays. The average ranks from eight repeats of this experiment identified 1938 intestine-enriched genes. First, we compared the genes expressed in the intestine to those expressed in muscle (Roy et al. 2002), and thereby identified 807 genes expressed in both tissues and 645 genes enriched in the intestine versus muscle. Second, we showed that the 1938 intestine-enriched genes were also positionally clustered on the chromosomes, suggesting that the order of genes in the genome is influenced by the effect of chromatin domains on gene expression. Furthermore, the tissue-specific lists showed less chromosomal clustering than the list of genes expressed in both the intestine and muscle. This observation suggests that chromatin domains may influence housekeeping genes more than tissue-specific genes. Third, in order to gain further insight into the regulation of expression of intestine-enriched genes, we searched for regulatory motifs in the set of intestine-enriched genes. We used a modified Gibbs sampling method called CompareProspector (Liu, 2004) to identify motifs that were conserved with C. briggsae and over-represented in the 1kb upstream region of the 645 intestine-specific genes. This analysis found that the promoter regions of the intestine genes were enriched for the consensus sequence for GATA transcription factors at a rate two-fold over random. In order to test the functionality of the GATA motif, we are using transcriptional GFP fusions of several intestinal markers with wild type and mutated GATA sites. Liu, Y., Liu, X. S., Wei, L., Altman, R. B., & Batzoglou, S. (2004). Eukaryotic regulatory element conservation analysis and identification using comparative genomics. Genome Research, 14(3), 451-458. Roy, P. J., Stuart, J. M., Lund, J., & Kim, S. K. (2002). Chromosomal clustering of muscle-expressed genes in Caenorhabditis elegans. Nature, 418(6901), 975-979.
[
West Coast Worm Meeting,
2004]
Regulatory motifs are short sequences of DNA that regulate the level, timing, and location of gene expression. Identifying these motifs and their functions is crucial in our understanding of gene regulation and disease processes. We developed CompareProspector, a motif-finding program that takes advantage of cross-species sequence comparison to identify putative regulatory motifs from sets of co-regulated genes [1] . We applied CompareProspector to 30 sets of genes with very similar patterns of expression, identified from the C. elegans topomap [2] and individual DNA microarray experiments. The statistical significance of each candidate motif identified was evaluated using criteria such as motif enrichment-the ratio of prevalence of the motif in a given set of promoters to its prevalence elsewhere in the genome, and the expression coherence of genes with the motif. We identified twelve significant regulatory motifs, three of which have literature evidence confirming they are true regulatory motifs. Overall, these twelve motifs are found in the upstream regulatory regions of 2970 different genes, and may be involved in gene regulation in 24 clusters of co-expressed genes. The first known motif, with the consensus TGATAA, matches the consensus of known binding sites for GATA factors. As GATA factors are known to be involved in worm intestine development [3] and hyperdermis development, it is not surprising that the GATA motif is identified from a set intestine-specific genes (F. Pauli, unpublished), mount08 of the topomap, which is enriched in genes from the intestine, and several collagen-related datasets (mount14, 17, and 35 of the topomap). We correctly identified GATA sites in the promoters of genes known to be regulatory by GATA factors. Interestingly, the GATA motif is also identified from several data sets involved in the aging process. This result parallels that of Murphy and colleagues, who independently identified this motif from their data set of DAF-16 target genes [4] . Both our result and the result from Murphy suggest that GATA factors may be involved in worm aging. Motif 2, which is identified in the two heat shock-related data sets, matches the consensus of known binding sites for heat shock factors [5] . Motif 3 matches the consensus of heat shock associated sites (HSAS), a motif that was first predicted computationally to be involved in the heat shock process [6] and later experimentally validated to be involved in ethanol stress response (14 th International C. elegans Conference abstract 1113C). We are currently in the process of validating the rest of the motifs and their individual binding sites using mutagenesis studies of promoters with predicted motifs. 1. Liu, Y., Liu, X.S., Wei, L., Altman, R.B. and Batzoglou, S. (2004) Eukaryotic regulatory element conservation analysis and identification using comparative genomics . Genome Res. 14 , 451-8. 2. Kim, S.K., Lund, J., Kiraly, M., Duke, K., Jiang, M., Stuart, J.M., Eizinger, A., Wylie, B.N. and Davidson, G.S. (2001) A gene expression map for Caenorhabditis elegans . Science. 293 , 2087-92. 3. Maduro, M.F. and Rothman, J.H. (2002) Making worm guts: the gene regulatory network of the Caenorhabditis elegans endoderm . Dev Biol. 246 , 68-85. 4. Murphy, C.T., McCarroll, S.A., Bargmann, C.I., Fraser, A., Kamath, R.S., Ahringer, J., Li, H. and Kenyon, C. (2003) Genes that act downstream of DAF-16 to influence the lifespan of Caenorhabditis elegans . Nature. 424 , 277-83. 5. Amin, J., Ananthan, J. and Voellmy, R. (1988) Key features of heat shock regulatory elements . Mol Cell Biol. 8 , 3761-9. 6. GuhaThakurta, D., Palomar, L., Stormo, G.D., Tedesco, P., Johnson, T.E., Walker, D.W., Lithgow, G., Kim, S. and Link, C.D. (2002) Identification of a novel cis-regulatory element involved in the heat shock response in Caenorhabditis elegans using microarray gene expression and computational methods . Genome Res. 12 , 701-12 .