We are using C. elegans as a model system to discover new small molecule tools for biological analyses. One obstacle to finding such tools is the large number of xenobiotic defense factors encoded by the worm genome (Lindblom et al., 2006). For example, in a screen of 3,265 small molecules that are bioactive in other systems, we found only 64 (2%) that induce robust phenotypes in the worm (Kwok et al., 2006). We hypothesized that this hit rate is relatively low because most small molecules fail to accumulate in worms. To test this, we developed an HPLC-based assay to measure small molecule accumulation in worm tissue. We surveyed 1,018 compounds from the Spectrum library (Microsource Inc.) for accumulation in whole worms after 6 hours of incubation in 40uM of the small molecules. To ensure confidence in our assignments we established a detection limit of 18uM. Of the 361/1018 compounds that satisfied this criterion, only 25 (6.9%) accumulate in worms. Notably, 2 out of 25 accumulating molecules induce robust phenotype in the worm, compared to 0 out of 336 non-accumulating molecules. We also assayed 25 nematicides obtained from our other small molecule screens, and found that 17 (68%) accumulate. These results support our hypothesis that worms are generally resistant to small molecule accumulation, and show that accumulating small molecules are enriched for bioactivity in C. elegans. Next, the accumulation of an additional 77 compounds from our other screens was assayed, for a complete dataset of 463 small molecules. We used this dataset to build a predictive structure-based model of accumulation. We compiled 4,166 structural descriptors of the small molecules in our dataset, and built a model using a Bayesian Classifier machine learning method. Five-fold cross validation of the model estimates a prediction accuracy of 75.36 +/- 2.04%. We used the model to rank 9,742 compounds of a DIVERSet library (Chembridge Corp.), of which 48 induce robust phenotype in the worm. Encouragingly, 12 of the top-scoring 200 molecules induce phenotype, representing a 12.2-fold enrichment for bioactivity compared to the entire library (
p10). None of the bottom-scoring 200 molecules induce phenotype. These data demonstrate that our model is effective at predicting small molecule accumulation and bioactivity in C. elegans. We hope to use this model to increase our efficiency at identifying new small molecule probes for biological analysis, and to aid the development of potential drug leads using C. elegans as a model.