- page settings
- showhide sidebar
- showhide empty fields
- layout
- (too narrow)
- open all
- close all
- Page Content
- Overview
- External Links
- History
- Referenced
- Tools
- Tree Display
- My WormBase
- My Favorites
- My Library
- Recent Activity
- Comments (0)
history logging is off
Tree Display
My Favorites
My Library
Comments on McCarroll S et al. (2000) West Coast Worm Meeting "Building a dictionary for C. elegans promoter sequences" (0)
Overview
McCarroll S, Bargmann CI, & Li H (2000). Building a dictionary for C. elegans promoter sequences presented in West Coast Worm Meeting. Unpublished information; cite only with author permission.
We would like to understand how regulatory sequences encode the expression patterns of genes. The availability of complete genomic sequence allows us to try to identify candidate transcriptional control sequences based on statistical criteria. We have used a dictionary algorithm (Bussemaker, Li, and Siggia, manuscript in preparation) to partition C. elegans promoter sequences into "words" -- discrete sequences that are distributed statistically as if they represent a coherent functional unit. We have built a dictionary for the upstream sequences of 850 C. elegans G-protein-coupled-receptor genes. When this dictionary is used to partition these promoter sequences according to maximum-likelihood criteria, 2.1% of the sequence is covered by words of length 8 or greater. Many of these words appear interesting according to multiple criteria: (i) non-Poisson distribution across genes, suggesting a tendency to appear in clusters; (ii) non-random distribution across positions, suggesting a preference for particular locations relative to the transcriptional start site, and (iii) appearance in genes that have similar expression patterns. We are using this dictionary in a number of ways: (i) We have correlated the expression patterns of several dozen chemoreceptor genes to the distributions of words across these genes, generating testable hypotheses about sequences that may confer cell-specific gene regulation; (ii) We are correlating gene-expression microarray data with the distribution of words across genes, to identify words that correspond to genes whose expression is modulated in particular experiments; (iii) We are testing candidate control sequences (words) in promoter-gfp fusion experiments, in which a particular word is deleted from a promoter (to test the necessity of this sequence for wild-type gene regulation) or added to a promoter (to test its sufficiency for conferring gene regulation).