[
Trends Genet,
1999]
The genome sequence of the free-living nematode Caenorhabiditis elegans is nearly complete, with resolution of the final difficult regions expected over the next few months. This will represent the first genome of a multicellular organism to be sequenced to completion. The genome is approximately 97 Mb in total, and encodes more than 19 099 proteins, considerably more than expected before sequencing began. The sequencing project - a collaboration between the Genome Sequencing Center in St Louis and the Sanger Centre in Hinxton - has lasted eight years, with the majority of the sequence generated in the past four years. Analysis of the genome sequence is just beginning and represents an effort that will undoubtedly last more than another decade. However, some interesting findings are already apparent, indicating that the scope of the project, the approach taken, and the usefulness of having the genetic blueprint for this small organism have been well worth the effort.
[
Stud Hist Philos Biol Biomed Sci,
2012]
This paper argues that the history of the computer, of the practice of computation and of the notions of 'data' and 'programme' are essential for a critical account of the emergence and implications of data-driven research. In order to show this, I focus on the transition that the investigations on the worm C. elegans experienced in the Laboratory of Molecular Biology of Cambridge (UK). Throughout the 1980s, this research programme evolved from a study of the genetic basis of the worm's development and behaviour to a DNA mapping and sequencing initiative. By examining the changing computing technologies which were used at the Laboratory, I demonstrate that by the time of this transition researchers shifted from modelling the worm's genetic programme on a mainframe apparatus to writing minicomputer programs aimed at providing map and sequence data which was then circulated to other groups working on the genetics of C. elegans. The shift in the worm research should thus not be simply explained in the application of computers which transformed the project from hypothesis-driven to a data-intensive endeavour. The key factor was rather a historically specific technology-in-house and easy programmable minicomputers-which redefined the way of achieving the project's long-standing goal, leading the genetic programme to co-evolve with the practices of data production and distribution.
[
J Proteomics,
2010]
Much of our knowledge on heredity, development, physiology and the underlying cellular and molecular processes is derived from the studies of model, or reference, organisms. Despite the great variety of life, a common base of shared principles could be extracted by studying a few life forms, selected based on their amenability to experimental studies. Very briefly, the origins of a few model organisms are described, including E. coli, yeast, C. elegans, Drosophila, Xenopus, zebrafish, mouse, maize and Arabidopsis. These model organisms were chosen because of their importance and wide use, which made them systems of choice for genome-wide studies. Many of their genomes were between the first to be fully sequenced, opening unprecedented opportunities for large-scale transcriptomics and proteomics studies.