We are developing a library of 2,000 mutagenized strains and sequencing each to a depth of 15x as a community resource. From previous estimates this should yield a library of half to one million SNP's with 5-10 non-synonymous changes per average gene across the collection. After testing various mutagens we are using EMS, ENU and a cocktail of EMS plus ENU. In the F1 we select for
unc-22 mutations to ensure that the genome has been mutagenized. In the F2 we select for non-
unc-22 animals and then these animals are self-crossed for 10 generations to ensure that the final isolate is homozygous across all regions of the genome. We will supplement these strains with natural isolates to recover additional mutations. For sequencing we load size-selected, barcoded samples on either a GAII, or Hi-Seq sequencing machine (Illumina) and do paired-end reads. Data analysis employs BWA, SamTools and custom filters. Sequence data will be deposited in WormBase and the individual strains will be available from the Caenorhabditis Genetics Center for detailed study. Eventually we hope to distribute the 2,000 strains as a single kit, allowing parallel experimentation on a wide spectrum of mutant genes.
To date we have constructed and tested libraries for more than 1,000 strains and completed analysis of 451 strains. The analysis has yielded 142,566 SNP's including 32,835 non-synonymous changes in 13,685 genes. These comprise 1,404 nonsense mutations in 1,310 genes, 804 splicing mutation in 777 genes, 30,607 missense mutations in 13,269 genes and 20 readthrough mutations in 20 genes. Of 2,200 genes with either a nonsense, or a splicing alteration, over 1,000 have no previously reported mutations. Based on read numbers, the rDNA repeat copy number is surprisingly variable, with some strains having fewer than 60 copies and a few having more than 200. By the meeting in Los Angeles we will report on the analysis of the first 1,000 strains.