ALE: An assembly likelihood evaluation framework to estimate the quality of metagenome assemblies

Scott Clark, Cornell University

Photo of Scott Clark

Massively parallel, high-throughput DNA sequencing technologies have allowed for directly assembling genomes of uncultivated organisms in a microbial community (metagenomes). Current methods used to evaluate single genome assemblies, including contig (contiguous constructed sequence) sizes or reference genome mapping, are not applicable to metagenome assemblies. In this poster we present a statistical method to systematically evaluate quality of the proposed metagenome assembly given the reads data from which it is derived. We developed a likelihood framework that takes into account read quality, mate pair orientation and insert length (for paired-end reads), coverage and read mapping information. This framework allows us to consider the likelihood of a proposed assembly given the reads, independent of contig size or reference genomes. We show that with synthetic data this method produces “ALE scores” that monotonically decrease with assembly quality for a single and metagenome, and increase as genomes become more complete. It also can discover errors in real genomes that are then independently validated. In summary, ALE scores provide unbiased estimations of metagenome assembly quality and can be used to greatly improve metagenome assembly.

Abstract Author(s): Scott C. Clark, Rob Egan, Peter Frazier, Zhong Wang