mGene: Accurate Computational Gene Finding
mGene is a computational tool for the genome-wide prediction of protein coding genes from eukaryotic DNA sequences. It is based on recent advances in machine learning and uses discriminative training techniques, such as support vector machines (SVMs) and hidden semi-Markov support vector machines (HSMSVMs). Its excellent performance was proved in an objective competition based on the genome of the nematode Caenorhabditis elegans. The evaluated developmental version of mGene exhibited the best prediction performance (in terms of the average between sensitivity and specificity) for the multiple-genome prediction tasks on all four evaluation levels (considering, nucleotides, exons, transcripts and genes). The ab-initio version was best on nucleotide, exon and transcript level, and only slightly worse than Augustus on the gene level. The fully developed version shows the best overall performance compared to the submitted gene finders' predictions, including the ones of Fgenesh and Augustus.
OMA
The algorithm is based on pairwise similarity scores. Bidirectional best hits are formed into stable-pairs which are postprocessed to remove paralogs. The remaining pairs are formed into cliques of orthologs. C.elegans data is based on WS170, C.briggsae on WS180, C.remanei on WS200 and P.pacificus on WS196