The assessment of diversity indices to characterize environmental samples is a core metric for microbiome research. However, the nature of large volumes of high throughput short sequences can cause identification issues when they are referenced against curated rRNA databases. This is especially true when trying to fully resolve the in-situ diversity in low coverage environments (e.g. soils, sediment, rhizosphere, etc.). Here, a large proportion of sequences referenced against such databases place reads into the Bacteria or Archaea ‘root’, indicating poorly resolved phylogenies (e.g. Greengenes taxonomies of Bacteria;Other and Archaea;Other ). In order to overcome this issue, a phylogenetic placement approach has the potential to bypass this problem, since a query sequence is placed within a pre-computed phylogenetic tree and not simply searched against a database for taxonomic assignment .
Here, we used sequences generated from semi-arid soils within the relatively unexplored region of Western Australia, to compare different diversity indices from taxonomic assignment versus phylogenetic placement. Briefly, 16S amplicon and assembled metagenomes were characterised with both a standard Qiime pipeline and with Phylosift. Moreover, Bacteria;Other and Archaea;Other sequences were merged together and used as inputs for Phylosift to gain a deeper understanding of the microorganisms inhabiting these enigmatic environments.