-
[
Genome Biol,
2014]
Increasingly, high-dimensional genomics data are becoming available for many organisms.Here, we develop OrthoClust for simultaneously clustering data across multiple species. OrthoClust is a computational framework that integrates the co-association networks of individual species by utilizing the orthology relationships of genes between species. It outputs optimized modules that are fundamentally cross-species, which can either be conserved or species-specific. We demonstrate the application of OrthoClust using the RNA-Seq expression profiles of Caenorhabditis elegans and Drosophila melanogaster from the modENCODE consortium. A potential application of cross-species modules is to infer putative analogous functions of uncharacterized elements like non-coding RNAs based on guilt-by-association.
-
[
J Biol Chem,
1999]
Mammalian Ca2+/CaM-dependent protein kinase kinase (CaM-KK) has been identified and cloned as an activator for two kinases, CaM kinase I (CaM-KI) and CaM kinase IV (CaM-KIV), and a recent report (Yano, S., Tokumitsu, H., and Soderling, T. R. (1998) Nature 396, 584-587) demonstrates that CaM-KK can also activate and phosphorylate protein kinase B (PKB). In this study, we identify a CaM-KK from Caenorhabditis elegans, and comparison of its sequence with the mammalian CaM-KK alpha and beta shows a unique Arg-Pro (RP)-rich insert in their catalytic domains relative to other protein kinases. Deletion of the RP-domain resulted in complete loss of CaM-KIV activation activity and physical interaction of CaM-KK with glutathione S-transferase-CaM-KIV (T196A). However, CaM-KK autophosphorylation and phosphorylation of a synthetic peptide substrate were normal in the RP-domain mutant. Site-directed mutagenesis of three conserved Arg in the RP- domain of CaM-KK confirmed that these positive charges are important for CaM-KIV activation. The RP- domain deletion mutant also failed to fully activate and phosphorylate CaM-KI, but this mutant was indistinguishable from wild-type CaM-KK for the phosphorylation and activation of PKB. These results indicate that the RP-domain in CaM-KK is critical for recognition of downstream CaM-kinases but not for its catalytic activity (i.e. autophosphorylation) and PKB activation.
-
[
J Biol Chem,
1999]
We have recently demonstrated that Caenorhabditis elegans Ca(2+)/calmodulin-dependent protein kinase kinase (CeCaM-KK) can activate mammalian CaM-kinase IV in vitro (Tokumitsu, H., Takahashi, N., Eto, K., Yano, S., Soderling, T.R., and Muramatsu, M. (1999) J. Biol. Chem. 274, 15803-15810). In the present study, we have identified and cloned a target CaM-kinase for CaM-KK in C. elegans, CeCaM-kinase I (CeCaM-KI), which has approximately 60% identity to mammalian CaM-KI. CeCaM-KI has 348 amino acid residues with an apparent molecular mass of 40 kDa, which is activated by CeCaM-KK through phosphorylation of Thr(179) in a Ca(2+)/CaM-dependent manner, resulting in a 30-fold decrease in the K(m) of CeCaM-KI for its peptide substrate. Unlike mammalian CaM-KI, CeCaM-KI is mainly localized in the nucleus of transfected cells because the NH(2)-terminal six residues ((2)PLFKRR(7)) contain a functional nuclear localization signal. We have also demonstrated that CeCaM-KK and CeCaM-KI reconstituted a signaling pathway that mediates Ca(2+)-dependent phosphorylation of cAMP response element-binding protein (CREB) and CRE-dependent transcriptional activation in transfected cells, consistent with nuclear localization of CeCaM-KI. These results suggest that the CaM-KK/CaM-KI cascade is conserved in C. elegans and is functionally operated both in vitro and in intact cells, and it may be involved in Ca(2+)-dependent nuclear events such as transcriptional activation through phosphorylation of CREB.
-
[
Biol Direct,
2007]
ABSTRACT: BACKGROUND: The evolution of the full repertoire of proteins encoded in a given genome is mostly driven by gene duplications, deletions, and sequence modifications of existing proteins. Indirect information about relative rates and other intrinsic parameters of these three basic processes is contained in the proteome-wide distribution of sequence identities of pairs of paralogous proteins. RESULTS: We introduce a simple mathematical framework based on a stochastic birth-and-death model that allows one to extract some of this information and apply it to the set of all pairs of paralogous proteins in H. pylori, E. coli, S. cerevisiae, C. elegans, D. melanogaster, and H. sapiens. It was found that the histogram of sequence identities $p$ generated by an all-to-all alignment of all protein sequences encoded in a genome is well fitted with a power-law form ~p;(-gamma) with the value of the exponent gamma around 4 for the majority of organisms used in this study. This implies that the intra-protein variability of substitution rates is best described by the Gamma-distribution with the exponent alpha ~ 0.33. Different features of the shape of such histograms allow us to quantify the ratio between the genome-wide average deletion/duplication rates and the amino-acid substitution rate. CONCLUSIONS: We separately measure the short-term (;;raw'''') duplication and deletion rates r*_dup, r*_del which include gene copies that will be removed soon after the duplication event and their dramatically reduced long-term counterparts r_dup, r_del. High deletion rate among recently duplicated proteins is consistent with a scenario in which they didn''t have enough time to significantly change their functional roles and thus are to a large degree disposable. Systematic trends of each of the four duplication/deletion rates with the total number of genes in the genome were analyzed. All but the deletion rate of recent duplicates r*_del were shown to systematically increase with N_genes. Abnormally flat shapes of sequence identity histograms observed for yeast and human are consistent with lineages leading to these organisms undergoing one or more whole-genome duplications. This interpretation is corroborated by our analysis of the genome of {Paramecium tetraurelia where the p;(-4) profile of the histogram is gradually restored by the successive removal of paralogs generated in its four known whole-genome duplication events. This article was reviewed by Eugene Koonin, Yuri Wolf (nominated by Eugene Koonin), David Krakauer, and Eugene Shakhnovich.
-
[
BMC Evol Biol,
2004]
BACKGROUND: Gene duplication followed by the functional divergence of the resulting pair of paralogous proteins is a major force shaping molecular networks in living organisms. Recent species-wide data for protein-protein interactions and transcriptional regulations allow us to assess the effect of gene duplication on robustness and plasticity of these molecular networks. RESULTS: We demonstrate that the transcriptional regulation of duplicated genes in baker's yeast Saccharomyces cerevisiae diverges fast so that on average they lose 3% of common transcription factors for every 1% divergence of their amino acid sequences. The set of protein-protein interaction partners of their protein products changes at a slower rate exhibiting a broad plateau for amino acid sequence similarity above 70%. The stability of functional roles of duplicated genes at such relatively low sequence similarity is further corroborated by their ability to substitute for each other in single gene knockout experiments in yeast and RNAi experiments in a nematode worm Caenorhabditis elegans. We also quantified the divergence rate of physical interaction neighborhoods of paralogous proteins in a bacterium Helicobacter pylori and a fly Drosophila melanogaster. However, in the absence of system-wide data on transcription factors' binding in these organisms we could not compare this rate to that of transcriptional regulation of duplicated genes. CONCLUSIONS: For all molecular networks studied in this work we found that even the most distantly related paralogous proteins with amino acid sequence identities around 20% on average have more similar positions within a network than a randomly selected pair of proteins. For yeast we also found that the upstream regulation of genes evolves more rapidly than downstream functions of their protein products. This is in accordance with a view which puts regulatory changes as one of the main driving forces of the evolution. In this context a very important open question is to what extent our results obtained for homologous genes within a single species (paralogs) carries
-
[
Cell,
2005]
Regulated apoptosis is part of the development of the nematode Caenorhabditis elegans. In a recent paper in Nature, Yan et al. (2005) describe the in vitro reconstitution of the core components of the worm apoptotic pathway. Based on a structural analysis of the complex between the death activator CED-4 and the antiapoptotic protein CED-9, the authors explain the regulation of activity of CED-4. Intriguingly, CED-4 comprises a AAA+ type ATPase domain yet does not seem to need ATP hydrolysis for activity.
-
[
Genome Biol,
2011]
We develop a statistical framework to study the relationship between chromatin features and gene expression. This can be used to predict gene expression of protein coding genes, as well as microRNAs. We demonstrate the prediction in a variety of contexts, focusing particularly on the modENCODE worm datasets. Moreover, our framework reveals the positional contribution around genes (upstream or downstream) of distinct chromatin features to the overall prediction of expression levels.
-
Niu W, Cheng C, Hwang W, Qian J, Snyder M, Lu ZJ, Rozowsky J, Gerstein M, Alves P, Kato M, Yan KK, Bhardwaj N
[
PLoS Comput Biol,
2011]
We present a network framework for analyzing multi-level regulation in higher eukaryotes based on systematic integration of various high-throughput datasets. The network, namely the integrated regulatory network, consists of three major types of regulation: TFgene, TFmiRNA and miRNAgene. We identified the target genes and target miRNAs for a set of TFs based on the ChIP-Seq binding profiles, the predicted targets of miRNAs using annotated 3'UTR sequences and conservation information. Making use of the system-wide RNA-Seq profiles, we classified transcription factors into positive and negative regulators and assigned a sign for each regulatory interaction. Other types of edges such as protein-protein interactions and potential intra-regulations between miRNAs based on the embedding of miRNAs in their host genes were further incorporated. We examined the topological structures of the network, including its hierarchical organization and motif enrichment. We found that transcription factors downstream of the hierarchy distinguish themselves by expressing more uniformly at various tissues, have more interacting partners, and are more likely to be essential. We found an over-representation of notable network motifs, including a FFL in which a miRNA cost-effectively shuts down a transcription factor and its target. We used data of C. elegans from the modENCODE project as a primary model to illustrate our framework, but further verified the results using other two data sets. As more and more genome-wide ChIP-Seq and RNA-Seq data becomes available in the near future, our methods of data integration have various potential applications.
-
[
J Biol Chem,
1998]
A novel L-3-hydroxyacyl-CoA dehydrogenase from human brain has been cloned, expressed, purified, and characterized. This enzyme is a homotetramer with a molecular mass of 108 kDa. Its subunit consists of 261 amino acid residues and has structural features characteristic of short chain dehydrogenases. It was found that the amino acid sequence of this human brain enzyme is identical to that of an endoplasmic reticulum amyloid beta-peptide-binding protein (ERAB), which mediates neurotoxicity in Alzheimer's disease (Yan, S. D., Fu, J., Soto, C., Chen, X., Zhu, H., Al-Mohanna, F., Collison, K., Zhu, A., Stern, E., Saido, T., Tohyama, M., Ogawa, S., Roher, A., and Stern, D. (1997) Nature 389, 689-695). The purification of human brain short chain L-3-hydroxyacyl-CoA dehydrogenase made it possible to characterize the structural and catalytic properties of ERAB. This NAD+-dependent dehydrogenase catalyzes the reversible oxidation of L-3-hydroxyacyl-CoAs to form 3-ketoacyl-CoAs, but it does not act on the D-isomers. The catalytic rate constant of the purified enzyme was estimated to be 37 s-1 with apparent Km values of 89 and 20 microM for acetoacetyl-CoA and NADH, respectively. The activity ratio of this enzyme for substrates with chain lengths of C4, C8, and C16 was approximately 1:2:2. The human short chain L-3-hydroxyacyl-CoA dehydrogenase gene is organized into six exons and five introns and maps to chromosome Xp11.2. The amino-terminal NAD-binding region of the dehydrogenase is encoded by the first three exons, whereas the other exons code for the carboxyl-terminal substrate-binding region harboring putative catalytic residues. The results of this study lead to the conclusion that ERAB involved in neuronal dysfunction is encoded by the human short chain L-3-hydroxyacyl-CoA dehydrogenase gene.
-
Samanta S, Waterston R, Seabrook-Sturgis S, Terrell R, Fisher WW, White KP, Yan KK, Gersch J, Hammonds AS, Ma L, Kirkey M, Han M, Kadaba M, Steffen D, Celniker SE, Gevirtzman L, Ammouri H, Hillier LW, Kudron MM, Wall ML, Gerstein M, Corson M, Szynkarek M, Moran J, Reinke V, Jameel N, Durham TJ, Xu J, Park S, Victorsen A, Patton J, Vafeados D
[
Genetics,
2017]
In order to develop a catalog of regulatory sites in two major model organisms, Drosophilia melanogaster and Caenorhabditis elegans, the modERN consortium has systematically assayed the binding sites of transcription factors (TFs). Combined with data produced by our predecessor, modENCODE, we now have data for 262 TFs identifying 1.23M sites in the fly genome and 217 TFs identifying 0.67M sites in the worm genome. Because sites from different TFs are often overlapping and tightly clustered, they fall into 91,011 and 59,150 regions in the fly and worm, respectively, and these binding sites span as little as 8.7 Mb and 5.8 Mb in the two organisms. Clusters with large numbers of sites (so-called HOT regions) predominantly associate with broadly expressed genes, whereas clusters containing sites from just a few factors are associated with genes expressed in tissue specific patterns. All of the strains expressing GFP-tagged TFs are available at the stock centers and the ChIP-seq data are available through the ENCODE DCC, and also through a simple interface
(http://epic.gs.washington.edu/modERN/) that facilitates rapid accessibility of processed datasets. These data will facilitate a vast number of scientific inquiries into the fuction of individual TFs in key developmental, metabolic, defense and homeostatic regulatory pathways, as well as provide a broader perspective on how individual TFs work together in local networks and globally across the lifespans of these two key model organisms.