[
International C. elegans Meeting,
2001]
The completion of the C. elegans genome sequence has identified nearly all of the genes in the genome (19,282 genes), but the function for most of these genes remains mysterious. A scant 6% of them have been studied using classical genetic or biochemical approaches (1135 genes), and only about 53% show homology to genes in other organisms (10,303 genes). The next challenge is to develop high-throughput, functional genomics procedures to study many genes in parallel in order to elucidate gene function on a global scale. One such approach is to use DNA microarrays to assay the relative expression of nearly every gene in the genome between two samples. Knowing when, where and under which conditions a particular gene is expressed can reveal the function of that gene. A consortium of laboratories has used C. elegans DNA microarrays to profile gene expression changes in a wide variety of experiments. In each experiment, RNA from one sample was used to generate Cy3-labelled cDNA, and RNA from another sample was used to prepare Cy5-labelled cDNA. The two cDNA probes were simultaneously hybridized to a single DNA microarray and the ratio of the Cy3 to Cy5 hybridization intensities was measured, revealing which genes were relatively enriched in either RNA sample. Thirty different laboratories have collectively performed 553 experiments using these C. elegans DNA microarrays, including 179 experiments with microarrays containing 11,917 genes (63% of the genome) and 370 experiments using microarrays that have 17,817 genes (94% of the genome). The experiments compare RNA between mutant and wild-type strains, or between worms grown under different conditions. Many experiments have been done to date, including experiments on wild-type development, heat shock, Ras signaling, aging, the dauer stage, sex regulation and germ line gene expression. Individual microarray experiments reveal sets of genes that change in one mutant strain or growth condition. We combine the data from all of the experiments to assemble a gene expression database, and then used this database to group together co-regulated genes and visualize them using a three dimensional expression map. By matching the expression profile of an unknown gene to those of genes with known functions, the expression map can be used to ascribe functions to the large fraction of genes in the genome whose functions were previously unknown.
[
West Coast Worm Meeting,
2004]
Regulatory motifs are short sequences of DNA that regulate the level, timing, and location of gene expression. Identifying these motifs and their functions is crucial in our understanding of gene regulation and disease processes. We developed CompareProspector, a motif-finding program that takes advantage of cross-species sequence comparison to identify putative regulatory motifs from sets of co-regulated genes [1] . We applied CompareProspector to 30 sets of genes with very similar patterns of expression, identified from the C. elegans topomap [2] and individual DNA microarray experiments. The statistical significance of each candidate motif identified was evaluated using criteria such as motif enrichment-the ratio of prevalence of the motif in a given set of promoters to its prevalence elsewhere in the genome, and the expression coherence of genes with the motif. We identified twelve significant regulatory motifs, three of which have literature evidence confirming they are true regulatory motifs. Overall, these twelve motifs are found in the upstream regulatory regions of 2970 different genes, and may be involved in gene regulation in 24 clusters of co-expressed genes. The first known motif, with the consensus TGATAA, matches the consensus of known binding sites for GATA factors. As GATA factors are known to be involved in worm intestine development [3] and hyperdermis development, it is not surprising that the GATA motif is identified from a set intestine-specific genes (F. Pauli, unpublished), mount08 of the topomap, which is enriched in genes from the intestine, and several collagen-related datasets (mount14, 17, and 35 of the topomap). We correctly identified GATA sites in the promoters of genes known to be regulatory by GATA factors. Interestingly, the GATA motif is also identified from several data sets involved in the aging process. This result parallels that of Murphy and colleagues, who independently identified this motif from their data set of DAF-16 target genes [4] . Both our result and the result from Murphy suggest that GATA factors may be involved in worm aging. Motif 2, which is identified in the two heat shock-related data sets, matches the consensus of known binding sites for heat shock factors [5] . Motif 3 matches the consensus of heat shock associated sites (HSAS), a motif that was first predicted computationally to be involved in the heat shock process [6] and later experimentally validated to be involved in ethanol stress response (14 th International C. elegans Conference abstract 1113C). We are currently in the process of validating the rest of the motifs and their individual binding sites using mutagenesis studies of promoters with predicted motifs. 1. Liu, Y., Liu, X.S., Wei, L., Altman, R.B. and Batzoglou, S. (2004) Eukaryotic regulatory element conservation analysis and identification using comparative genomics . Genome Res. 14 , 451-8. 2. Kim, S.K., Lund, J., Kiraly, M., Duke, K., Jiang, M., Stuart, J.M., Eizinger, A., Wylie, B.N. and Davidson, G.S. (2001) A gene expression map for Caenorhabditis elegans . Science. 293 , 2087-92. 3. Maduro, M.F. and Rothman, J.H. (2002) Making worm guts: the gene regulatory network of the Caenorhabditis elegans endoderm . Dev Biol. 246 , 68-85. 4. Murphy, C.T., McCarroll, S.A., Bargmann, C.I., Fraser, A., Kamath, R.S., Ahringer, J., Li, H. and Kenyon, C. (2003) Genes that act downstream of DAF-16 to influence the lifespan of Caenorhabditis elegans . Nature. 424 , 277-83. 5. Amin, J., Ananthan, J. and Voellmy, R. (1988) Key features of heat shock regulatory elements . Mol Cell Biol. 8 , 3761-9. 6. GuhaThakurta, D., Palomar, L., Stormo, G.D., Tedesco, P., Johnson, T.E., Walker, D.W., Lithgow, G., Kim, S. and Link, C.D. (2002) Identification of a novel cis-regulatory element involved in the heat shock response in Caenorhabditis elegans using microarray gene expression and computational methods . Genome Res. 12 , 701-12 .
[
West Coast Worm Meeting,
2004]
Although many advances have been made by studying individual genes, the global picture of tissue-specific gene expression in an organism remains dim. The relatively simple body plan, yet representative of many specific tissues, and complete genome sequence of the nematode C. elegans allows for a systems biology approach to the question of which genes specify a tissue. A global profile of the tissue-specific gene expression of this organism would expand our knowledge of tissue development and maintenance, regulation of gene expression, higher order chromosome structure, and the housekeeping genes that characterize a cell. In order to identify the genes expressed in the intestine of the worm we have taken the mRNA-tagging approach described by Roy et al (2002) to identify genes expressed in muscle cells. Animals expressing FLAG::PAB-1 from the intestine-specific promoter
ges-1 were used to immunoprecipitate FLAG::PAB-1/mRNA complexes from the intestine and the number of enriched genes relative to whole worm lysate was determined by microarrays. The average ranks from eight repeats of this experiment identified 1938 intestine-enriched genes. First, we compared the genes expressed in the intestine to those expressed in muscle (Roy et al. 2002), and thereby identified 807 genes expressed in both tissues and 645 genes enriched in the intestine versus muscle. Second, we showed that the 1938 intestine-enriched genes were also positionally clustered on the chromosomes, suggesting that the order of genes in the genome is influenced by the effect of chromatin domains on gene expression. Furthermore, the tissue-specific lists showed less chromosomal clustering than the list of genes expressed in both the intestine and muscle. This observation suggests that chromatin domains may influence housekeeping genes more than tissue-specific genes. Third, in order to gain further insight into the regulation of expression of intestine-enriched genes, we searched for regulatory motifs in the set of intestine-enriched genes. We used a modified Gibbs sampling method called CompareProspector (Liu, 2004) to identify motifs that were conserved with C. briggsae and over-represented in the 1kb upstream region of the 645 intestine-specific genes. This analysis found that the promoter regions of the intestine genes were enriched for the consensus sequence for GATA transcription factors at a rate two-fold over random. In order to test the functionality of the GATA motif, we are using transcriptional GFP fusions of several intestinal markers with wild type and mutated GATA sites. Liu, Y., Liu, X. S., Wei, L., Altman, R. B., & Batzoglou, S. (2004). Eukaryotic regulatory element conservation analysis and identification using comparative genomics. Genome Research, 14(3), 451-458. Roy, P. J., Stuart, J. M., Lund, J., & Kim, S. K. (2002). Chromosomal clustering of muscle-expressed genes in Caenorhabditis elegans. Nature, 418(6901), 975-979.
[
International C. elegans Meeting,
2001]
We are using the full-genome microarrays to profile gene expression differences associated with exit from dauer larvae. In principle, by examining gene expression differences on the whole-genome level, we will be able to illuminate the complete set of genes that are implicated for dauer-specific attributes. Identification of dauer-enriched genes would provide a framework for understanding dauer-specific characteristics, such as altered energy metabolism, certain aspects of aging, stress resistance, and the coordinated execution of a complex morphological change. To identify genes involved with dauer exit, we performed DNA microarray experiments with RNA isolated at different time points after addition of food to a dauer population. Each RNA sample was compared to a common reference RNA, and multiple RNA samples were prepared for each time point. Since some of the genes regulated upon dauer exit might be in response to the introduction of food rather than a dauer-specific developmental response, we also examined the feeding of starved L1's over a similar timecourse. We have identified and begun to analyze about 2700 and 1900 genes that are reproducibly altered during dauer exit and L1 feeding, respectively. Examination of the kinetic profiles of these genes reveals several temporally distinct expression groups, defining a developmental cascade of events that occur during dauer exit or L1 feeding. We find about 400 genes that have similar expression patterns in both timecourses and that constitute a common feeding response. Approximately 900 genes have different expression profiles in the two timecourse. Most are regulated only in the dauer timecourse. These likely represent the dauer-specific developmental response. We have separated the genes, identified above, into different groups based on their expression patterns by hierarchical clustering. Since genes that function together in a common process tend to be placed in the same cluster, we are currently pursuing in-depth analysis of the different clusters to ascertain what types of processes occur during dauer exit and L1 feeding. We found that stress genes (such as heat shock proteins and superoxide dismutases) are highly expressed in the dauer larvae. We also find that detoxification genes are enriched in the dauer larvae. Another interesting and unexpected discovery is that some Major Sperm Proteins (MSPs) are expressed in dauers. MSPs were previously reported to be only expressed in males or L4 and young adult hermaphrodites and to function only as components of sperm. Expression of MSPs in the dauer larvae may reveal an alternative function for a subset of the MSPs.
[
International C. elegans Meeting,
2001]
The excitatory neurotransmitter acetylcholine (ACh) not only controls muscle cell contraction, but can also lead to a change in muscle cell physiology. For example, in mammals, chronic exposure to the ACh agonist nicotine can desensitize cells, leading to addiction. To better understand how a post-synaptic cell responds to ACh, we are examining the genome-wide transcriptional changes in C. elegans striated muscle cells in response to varying cholinergic signal. To do this, we first needed to develop a method to measure relative gene expression levels in single tissues in C. elegans. Previous DNA microarray experiments have compared gene expression levels from entire worms rather than from specific tissues. A major limitation to this approach is that gene expression changes in one tissue, such as muscle, could be obscured by expression in other tissues. To isolate muscle mRNAs, we use the
myo-3 promoter (a gift from A. Fire) to drive expression of an epitope-tagged poly(A)-mRNA-binding protein in striated muscle. The mRNA/tagged-protein complex is co-immunoprecipitated from whole-animal lysates using antibodies against the epitope tag. mRNAs significantly enriched in muscle are identified by comparing the immunoprecipitated mRNA to the mRNA present in the initial worm extract using DNA microarrays. We call this technique "mRNA-tagging". L1-muscle mRNA-tagging (6 repeats) revealed 1453 genes that were significantly enriched, including most characterized muscle-specific genes (positive controls) and excluding most non-muscle genes (negative controls). We repeated the entire experiment using L2 larvae (6 repeats) and found that there were few significant differences between L1 and L2 muscle. Together, these results indicate that mRNA-tagging successfully identifies genes expressed in striated muscle. To profile transcriptional changes in striated muscle in response to ACh, we either excited muscle cells using the nicotinic ACh agonist levamisole or prevented normal muscle contraction using an ACh receptor mutant. We then used mRNA tagging to identify genes expressed in excited and mutant muscle, and compared those genes to each other and to those expressed in "normal" muscle. 304 genes show increased expression and 851 genes show decreased expression in levamisole-excited muscle, relative to normal or mutant muscle. The gene expression data indicates that ACh excitation decreases the relative expression of ACh receptors and other ligand-gated ion channels, which may represent new ACh receptor candidates. Specifically, the expression levels of seven ligand-gated receptors, two 2nd messenger-gated receptors, and two characterized ACh receptor genes (
lev-1 and
unc-29), were significantly lower in the levamisole-treated or normal worms than they were in the mutant. The overall effect may be to desensitize cells to acetylcholine following strong activation, and to sensitize cells in ACh receptor mutants.
[
East Coast Worm Meeting,
2002]
Our lab uses microarray analysis to study germline development in C. elegans. Through microarrays, we can examine the global transcriptional response of all genes to a specific condition or mutation. Previous microarray experiments have compared global gene expression patterns of wild type animals to mutant animals lacking a germline. Analysis of these data has identified a set of genes that are germline enriched. The Ras/MAP kinase pathway is involved in many developmental processes in C. elegans, including meiotic progression. We are examining the effects of MAP kinase signaling using a temperature sensitive MAP kinase mutant allele,
mpk-1(
ga111), which produces a phenotype only in the germline. When these mutant animals are raised at the restrictive temperature, germ cells arrest at the pachytene stage of meiosis. By comparing gene expression of these MAP kinase mutants to control animals through microarray analysis, we can examine the global genome response to loss of MAP kinase signaling in the germline. To do this, we have compared mRNA extracts from control and
mpk-1 mutant adults raised at the restrictive temperature and obtained a set of candidate genes that are enriched in the control as compared to the
mpk-1 animals. We have also shown that
mpk-1(
ga111) adults can resume MAP kinase signaling, as evidenced by resumption of meiotic progression, when shifted to the permissive temperature. Oocytes are detected in these animals 12 hours after shifting and oocyte production is similar to wild type by 20 hours. Thus, we have shifted the control and mutant animals to the permissive temperature and compared their gene expression after 9 hours by microarray. The sets of genes that show changes in the expression ratio, control/mpk-1, after 9 hours at the permissive temperature provide information about the temporal nature of the gene expression changes that are transcriptionally regulated by MAP kinase in the germline. These two conditions define two groups of MAP kinase responsive genes. The early-induced category includes genes that are enriched in the control/mpk-1 comparison before shifting, but are no longer enriched after 9 hours at the permissive temperature and are candidates for direct targets of MAP kinase signaling. The late-induced category includes genes that are enriched at both timepoints. Using the results of these experiments, we were able to define 50 early-induced and 162 late-induced genes. Of these, 35 and 52 genes are germline-enriched, respectively. We are currently examining the regulation and function of the 35 germline-enriched, early-induced genes. This experiment will be extended with a complete timecourse microarray analysis including timepoints from 4 to 20 hours after shifting to the permissive temperature.
[
International Worm Meeting,
2003]
Regulatory elements (such as transcription factor binding sites) are crucial to transcriptional control. As whole genome sequences and large amount of microarray data become available, it is now possible to computationally identify regulatory elements. Though not a replacement for experimental methods, computational methods can guide and speed up the experimental methods by narrowing down the list of potential site for further exploration. In one experiment, we searched for motifs (cis-regulatory elements) in the upstream region of 72 genes whose expression changed significantly in both the egg and larvae stages after heat shock (M. Jiang, B. Romagnolo, unpublished data). As regulatory elements tend to be more conserved across species than background sequence due to their important function, we employed a comparative genomics motif-finding program, CompareProspector, that searches for motifs in regions that are conserved between C. elegans and C. briggsae. Three motifs were identified from the upstream regions of the heat shock-regulated genes. The first motif, with the consensus GAAYKTTCTAGAA, matches very well with the known heat shock element (HSE), which has the consensus GAAnnTTCnnGAA. The second motif (GGGTCTC) has previously been shown to be a true regulatory element that when mutated affects gene expression level (GuhaThakurta et al., 2002). The third motif (GAGACGCAG) is a new motif with unknown function. A variety of criteria, such as motif enrichment compared to background sequence and correlation of motif score with gene expression value were used to test whether this motif is relevant to the heat shock response. We intend to use this method to find regulatory elements in the whole C. elegans genome.