Accurate identification of protein coding regions in metagenomic sequences is challenging. The MetaGeneMark-2 plugin relies on an innovative approach to solve the parameter estimation problem that conventional gene finding algorithms face due to short contig length and absence of contig’s genomic context.
GENE PROBE Inc., the developers of MetaGeneMark, have created and refined algorithms for gene prediction in metagenomic sequences for more than fifteen years. The MetaGeneMark-2 plugin is further optimized for gene finding in anonymous metagenomic sequences. Our tests show that MetaGeneMark-2 reduces nearly twice the rate of false negative predictions, missed genes, in comparison with MetaGeneMark, where it was estimated to be 2.7%.
MetaGeneMark-2 (metagenomic gene caller with precomputed sets of model parameters) is an ab initio computational tool designed to predict intronless protein coding genes in metagenomic sequences. Parameters of high order statistical models of protein coding and non-¬coding regions are precomputed for each possible sequence composition characterized by the sequence GC content. This heuristic method essentially reconstructs genomic context of a given short anonymous sequence (Zhu et al., 2010*). MetaGeneMark-2 implements the Viterbi algorithm for hidden semi-Markov model describing functional and structural organization of a metagenomic sequence.
MetaGeneMark-2 besides the standard mode of “Gene prediction in prokaryotic metagenomes (genetic code 11)” provides also a mode: “Gene prediction in eukaryotic metatranscriptomes” (Genetic code 1)
*Zhu W., Lomsadze A. and Borodovsky M. Ab initio gene identification in metagenomic sequences.
Nucleic Acids Research, 2010, Vol.38, No.12, e132, doi: 10.1093/nar/gkq275