We have determined the nucleotide sequence of 37 major sperm protein (MSP) genes in C. elegans. The sequence data has been used to construct an evolutionary tree using the unweighted pair-group method with arithmetic mean (UPGMA), Figure 1. The most obvious result is the strong correlation between genomic location and sequence similarity. Each phylogenetic cluster of genes corresponds to one of four genomic gene clusters (
msp-72ps is isolated by itself on chromosome V). Surprisingly, when the sequence data was restricted to different regions within the MSP genes, phylogenetic relationships varied considerably. The variability however was only observed among the genes within each cluster, the overall phylogeny of the four clusters was left unchanged. This result suggests that phylogenetic information varies along many of the gene sequences but in a manner that would retain the partitioning of the clusters. More detailed analysis of the data confirmed this and demonstrated that many genes are in fact historically mosaic, comprised of DNA sequences similar to other genes within the same cluster. In many cases the basis for an individual gene's mosaic nature is best explained by the high number of apparent genetic recombination/gene conversion events observed. These events appear to be more prevalent, and probably greatly restricted, to genes found in close proximity to one another within the clusters. For example, genes from cluster II-L that demonstrate genetic exchange, or 100% homology among themselves, can be placed into four groups. These four groups correlate very well with proposed sub-clusters inferred from the relative genomic locations of the genes. In the other clusters the mosaic nature of the genes is also evident, yet possibly do to age, the mosaicism is characterized by shorter less obvious stretches of sequence similarity. Figure 1 would also suggest that the IV-L cluster of genes is relatively recent and originated from the II-L gene cluster. The high conservation of 5' flanking regions also supports this conclusion. [See Figure 1]