[
WormBook,
2006]
Throughout the C. elegans sequencing project Genefinder was the primary protein-coding gene prediction program. These initial predictions were manually reviewed by curators as part of a "first-pass annotation" and are actively curated by WormBase staff using a variety of data and information. In the WormBase data release WS133 there are 22,227 protein-coding gene, including 2,575 alternatively-spliced forms. Twenty-eight percent of these have every base of every exon confirmed by transcription evidence while an additional 51% have some bases confirmed. Most of the genes are relatively small covering a genomic region of about 3 kb. The average gene contains 6.4 coding exons accounting for about 26% of the genome. Most exons are small and separated by small introns. The median size of exons is 123 bases, while the most common size for introns is 47 bases. Protein-coding genes are denser on the autosomes than on chromosome X, and denser in the central region of the autosomes than on the arms. There are only 561 annotated pseudogenes but estimates but several estimates put this much higher.
[
Methods Cell Biol,
1995]
The clone-based physical map of the 100-Mb Caenorhabditis elegans genome has evolved over a number of years. Although the detection of clone overlaps and construction of the map have of necessity been carried out centrally, it has been essentially a community project. Without the provision of cloned markers and relevant map information by the C. elegans community as a whole, the map would lack the genetic anchor points and coherent structure that make it a viable entity. Currently, the map consists of 13 mapped contigs totaling in excess of 95 Mb and 2 significant unmapped contigs totaling 1.3 Mb. Telomeric clones are not yet in place. The map carries 600 physically mapped loci, of which 262 have genetic map data. With one exception, the physical extents of the remaining gaps are not known. The exception is the remaining gap on linkage group (LG) II. This has been shown to be bridged by a 225-kb Sse83871 fragment. Because the clones constituting the map are a central resource, there is essentially no necessity for individuals to construct cosmid and yeast artificial chromosome (YAC) libraries. Consequently, such protocols are not included here. Similarly, protocols for clone fingerprinting, which forms the basis of the determination of cosmid overlaps and the mapping of clones received from outside sources and has to be a centralized operation, and YAC linkage are not give here. What follows is essentially a "user's guide" to the physical map. Details of map construction are given where required for interpretation of the map as distributed. The physical mapping has been a collaboration between the MRC Laboratory of Molecular Biology, Cambridge, United Kingdom (now at The Sanger Centre, Cambridge, UK) and Washington University School of Medicine, St. Louis, Missouri. Inquiries regarding map interpretation, information, and materials should be addressed to alan@sanger.ac.uk or rw@nematode.wustl.edu.
A previous chapter in this series (1) described, primarily, the physical mapping of the 100 Mb Caenorhabditis elegans genome by fingerprinting of cosmid clones, and the linking of the contigs thus derived by YAC hybridization. At that time, the primary function of the map was to enhance the molecular genetics of the organism by facilitating the cloning of known genes, and to serve as an archive for genomic information. However, a clonal physical map - even with good alignment to the genetic map - carries only a tiny proportion of the information present in the genome. Consequently, the current objective of the C. elegans genome project (2) is to establish of the entire genomic sequence. The bacterial clone map, although incomplete by virtue of the uncloneability of regions of the genome in cosmid vectors (a factor which we shall discuss later in this chapter), has proved a sound basis for the systematic sequence analysis. The sevenfold cosmid coverage has a resolution sufficient to enable the selection of a subset of cosmids for sequencing such that, on average, each clone contributes 30 kb of unique sequence to the whole. Sequencing projects based on bacterial clone maps (3-5) of a number of other genomes of a range of sizes are also well advanced, in particular Saccharomyces cerevisiae (15 Mb; complete), Schizosaccharomyces pombe (15Mb), and Drosohpila melanogaster (150 Mb). Although it has recently been demonstrated that small bacterial genomes can be sequenced by direct shotgun sequence analysis of the entire genome with no prior mapping (6), the ability to interrelate and map clone sets, whether derived by random selection of in a directed manner, is still the most convenient route to the sequence analysis of larger genomes.