%0 Journal Article %T A statistical toolbox for metagenomics: assessing functional diversity in microbial communities %A Patrick D Schloss %A Jo Handelsman %J BMC Bioinformatics %D 2008 %I BioMed Central %R 10.1186/1471-2105-9-34 %X Application of these tools to acid mine drainage, soil, and whale fall metagenomic sequence collections revealed groups of peptide fragments with a relatively high abundance and no known function. When combined with analysis of 16S rRNA gene fragments from the same communities these tools enabled us to demonstrate that although there was no overlap in the types of 16S rRNA gene sequence observed, there was a core collection of operational protein families that was shared among the three environments.The results of comparisons between the three habitats were surprising considering the relatively low overlap of membership and the distinctively different characteristics of the three habitats. These tools will facilitate the use of metagenomics to pursue statistically sound genome-based ecological analyses.Metagenomics, the culture-independent isolation and characterization of DNA from uncultured microorganisms [1], has facilitated the analysis of the functional biodiversity harbored in the large reservoir of uncultured bacteria and archaea [2-4]. Although early metagenomic studies identified individual genes or activities of interest, recent advances in genome sequencing technologies have made obtaining a complete metagenomic sequence more tractable. Sequence-based approaches combined with functional expression approaches have the potential to identify novel genes important for industrial and ecological applications. Sequence-based approaches have recently been applied to DNA obtained from viruses [5,6], seawater [7-10], wastewater [11,12], sediment [13], sponges [14], acid mine drainage [15], marine worms [16], human gut [17], soil [18], and decomposing whale carcasses [18]. The analysis used to describe these communities has primarily focused on the descriptive characterization and comparison of the relative abundance of proteins that belong to specific functional categories.Attempts to analyze metagenomic sequences have proven that a metagenomic sequence is more tha %U http://www.biomedcentral.com/1471-2105/9/34