Lejeune, Francois-Xavier et al. (2011) International Worm Meeting "The Biogemix knowledge base project: cross-species and network-based data integration for Huntington's disease research."

Vert, Jean-Philippe, Parmentier, Frederic, Lejeune, Francois-Xavier, Neri, Christian, Bicep, Cedric, Mesrob, Lilia

[

International Worm Meeting,

2011]

The identification and validation of neuroprotective targets is of primary importance in research on neurodegenerative diseases such as Huntington's disease (HD). The development of genetically tractable models of disease and their use in genome-wide screens has generated a large amount of data in several species. A current challenge is the unbiased integration of these data sets in order to prioritize candidate target genes. The Biogemix knowledge base project has been developed with the European HD network (Euro-HD) to integrate 'omics' data from models of HD pathogenesis as available in several species (invertebrates, mammalian cells, mice, human samples). This project relies on the combination of network-based and cross-species procedures to unlock the biological information buried into disease data sets. The Biogemix procedure is a method that relies on the use of molecular networks for the unbiased integration of 'omics' data across different species. This method is particularly suited to the analysis of data sets for which the number of genes analyzed clearly exceeds the number of conditions tested. Single data sets are firstly processed with respect to a reference molecular network (for instance, use of MouseNet to analyze mouse data) to extract clusters (modules) that are made of highly interconnected genes, enriched in HD-relevant information and automatically annotated for their biological role and biomedical potential. In a second step, cross-species clusters (meta-modules) are calculated by balancing gene/protein connectivity with protein sequence similarities. In a third step, all of the Biogemix products are ranked according to topological and biological features of interest, which is part of a larger prioritisation system that uses several criteria to classify modules and genes of high interest. A user-friendly graphical interface and query system is being developed to allow the users to browse and select Biogemix information of interest. Further developments will aim at fine-tuning data analysis and information display in view of making the Biogemix knowledge base v 1.0 publically available on-line. Preliminary results will be shown to illustrate how the Biogemix system might be useful for basic research and disease research.