ad

Metagenomics and the Human Microbiome: From Sequencing Reads to Biological Understanding

Metagenomics is the study of all genetic material recovered directly from environmental or clinical samples, without the need to culture individual organisms first. It has fundamentally changed how we understand which microorganisms inhabit various niches of the human body, which are present in infected tissue, which are circulating in a septic patient's bloodstream, and which constitute the complex ecosystems of soil, ocean, and fermented foods.

🛠️ Metagenomics and Microbiome Explorer

Interactive Simulator

Select Body Site

Select a body site to explore its microbiome
ad

The human microbiome, the complete collection of microorganisms (bacteria, viruses, fungi, archaea, and protozoa) that colonise our body surfaces and cavities, contains approximately 38 trillion microbial cells, roughly equal to or slightly exceeding the number of human cells in the body. The gut microbiome alone encodes more than 150 times as many unique genes as the human genome. Understanding the composition, function, and dysregulation of this community is one of the most active areas in biomedical research, with implications for infectious disease, autoimmunity, mental health, cancer, and metabolic disease.


16S rRNA Amplicon Sequencing: The Most Common Approach

The 16S ribosomal RNA gene is present in all bacteria and archaea and contains highly conserved regions (the same sequence across almost all bacteria) flanking hypervariable regions (V1 through V9) that differ enough between species to distinguish them. This structure makes 16S rRNA the universal target for bacterial community profiling.

In 16S amplicon sequencing, DNA is extracted from the sample (stool, swab, tissue, environmental specimen), the 16S gene is amplified by PCR using primers targeting conserved flanking regions, the amplicons are sequenced using next-generation sequencing (typically Illumina MiSeq or similar), and the resulting reads are bioinformatically processed to assign taxonomy.

The bioinformatics pipeline: raw reads are quality-filtered. Paired reads are merged. Chimeric sequences (artefacts from PCR joining two different template sequences) are removed. The remaining reads are clustered into Operational Taxonomic Units (OTUs) at a similarity threshold (typically 97 per cent identity for species-level grouping) or processed as Amplicon Sequence Variants (ASVs, individual unique sequences, a more precise modern approach replacing OTU clustering). Taxonomy is assigned by comparing to reference databases: SILVA, Greengenes, NCBI, or RDP.

The result is a table of taxa and their abundances in each sample: the basis for all downstream microbiome analysis.


Shotgun Metagenomics: Reading Everything

While 16S amplicon sequencing targets only bacteria and archaea, and only gives relative abundances of taxa, shotgun (whole metagenome) sequencing sequences all the DNA in a sample without any prior PCR amplification. This detects bacteria, viruses, fungi, archaea, and parasites simultaneously. It provides access to functional information (the metabolic genes present in the community, not just who is there but what they can do), resistance gene identification, and more accurate quantification.

The challenge is cost and data complexity. A shotgun metagenomics run generates gigabases of sequencing data, much of which is human host DNA (particularly in clinical samples from blood, tissue, or bronchoalveolar lavage). Computational approaches that rapidly remove human reads in silico (or physical depletion of host DNA before sequencing) are essential to make metagenomic data interpretable.

Clinical metagenomics for infectious disease diagnosis is an emerging and rapidly growing application. Metagenomic next-generation sequencing (mNGS) applied to CSF, BAL, blood, or tissue can detect unexpected or fastidious pathogens that routine culture misses: Tropheryma whipplei (Whipple's disease), Bartonella, unusual fungi, newly emerging viral pathogens. Multiple clinical studies and commercial platforms (UCSF's CLIA-certified CSF mNGS service; Karius plasma mNGS for bloodstream infection) have demonstrated clinical utility in cases where standard diagnostics were non-diagnostic.


Alpha and Beta Diversity: How Microbiome Studies Measure Community Complexity

Alpha diversity measures the richness and evenness of microbial species within a single sample.

Species richness counts how many different taxa are present. A sample with 300 different bacterial species has higher richness than one with 50, even if the total number of bacteria is the same.

Shannon diversity index combines richness with evenness: it is highest when many different taxa are present at equal abundance, and lower when a few taxa dominate. Low Shannon diversity in the gut microbiome is associated with inflammatory bowel disease, antibiotic exposure, and poor clinical outcomes after certain infections.

Observed OTUs, Chao1 estimator, and Faith's phylogenetic diversity are other alpha diversity metrics each capturing slightly different aspects of community composition.

Beta diversity measures how different the microbial communities are between two samples. The most used metrics are UniFrac (which incorporates phylogenetic relationships between taxa: weighted UniFrac considers abundance, unweighted UniFrac considers only presence/absence) and Bray-Curtis dissimilarity. Beta diversity is visualised by ordination methods: Principal Coordinates Analysis (PCoA) or Principal Component Analysis (PCA) produce 2D or 3D plots where samples cluster by similarity.


Key Gut Microbiome Phyla and Their Clinical Relevance

The human gut microbiome is dominated by two phyla: Firmicutes (gram-positive, including Lachnospiraceae, Ruminococcaceae, Lactobacillaceae) and Bacteroidetes (gram-negative, including Bacteroides and Prevotella). In healthy adults, the Firmicutes:Bacteroidetes ratio is roughly 1:1 to 3:1. Significant alterations from this ratio are associated with obesity (increased Firmicutes), inflammatory bowel disease, and other conditions, though the relationship is complex and not simply causal.

Actinobacteria (including Bifidobacterium, abundant in breastfed infants) and Proteobacteria (typically low in abundance in healthy gut but increased in dysbiosis states) are the other major phyla.

Clostridioides difficile (CDI): colonisation and infection with C. difficile is strongly modulated by the gut microbiome. Antibiotic disruption of the microbiome reduces colonisation resistance (the ability of the resident microbiome to exclude C. difficile) by eliminating Lachnospiraceae, Ruminococcaceae, and Bacteroidetes that compete with C. difficile or produce inhibitory metabolites. Faecal microbiota transplantation (FMT), transferring a healthy donor stool microbiome to the patient, restores colonisation resistance and is highly effective (85 to 90 per cent cure rate) for recurrent CDI.


Resistome, Virulome, and the Functional Microbiome

The resistome is the complete collection of antibiotic resistance genes within a microbial community. Shotgun metagenomics of stool or environmental samples detects the resistome, revealing which resistance genes are circulating in a community even without any identified clinical infection. Gut resistome monitoring is used in antibiotic stewardship research and in monitoring the emergence of resistance genes after antibiotic use.

The virulome is the collection of virulence factor genes. Metagenomics can detect virulence gene signatures in complex samples, identifying pathogenic organisms within a complex community.

Short-chain fatty acids (SCFAs) such as butyrate, propionate, and acetate are produced by gut bacteria fermenting dietary fibre. Butyrate is the primary energy source for colonocytes, maintains gut barrier integrity, and has anti-inflammatory properties. Reduced butyrate-producing bacteria (Faecalibacterium prausnitzii, Roseburia intestinalis) are associated with IBD, metabolic syndrome, and impaired immune regulation.


Frequently Asked Questions

What is metagenomics?

Metagenomics is the direct sequencing of all genetic material in a sample (environmental, clinical, or microbial community) without first culturing individual organisms. It allows simultaneous detection and characterisation of bacteria, viruses, fungi, archaea, and parasites in a single assay, giving a comprehensive picture of the microbial community that culture-based methods cannot provide.

What is the human microbiome?

The human microbiome is the complete collection of microorganisms (bacteria, viruses, fungi, archaea, and protozoa) that colonise the body surfaces and cavities of a healthy human. The gut microbiome is the most studied and contains approximately 38 trillion microbial cells in an adult. The Human Microbiome Project (HMP) characterised baseline diversity across body sites in healthy individuals.

What is 16S rRNA sequencing?

16S rRNA sequencing targets the 16S ribosomal RNA gene, which contains conserved regions (used for universal PCR amplification) flanking hypervariable regions (V1 to V9) that differ between bacterial species. By amplifying and sequencing these hypervariable regions from a community sample, researchers can identify which bacterial and archaeal taxa are present and their relative abundance, without culturing.

What is alpha diversity?

Alpha diversity measures the richness and evenness of microbial taxa within a single sample. Higher alpha diversity generally indicates a more complex, resilient microbial community. Low alpha diversity is associated with dysbiosis, antibiotic exposure, and disease states such as IBD and recurrent CDI.

What is beta diversity?

Beta diversity measures the compositional difference between two microbial communities (between two samples or two groups). Methods include UniFrac distances and Bray-Curtis dissimilarity. Beta diversity is visualised using ordination plots where samples that are compositionally similar cluster together.

What is faecal microbiota transplantation (FMT)?

FMT is a therapeutic procedure that transfers gut microbiome from a healthy screened donor to a patient with a disrupted microbiome, most commonly used for recurrent Clostridioides difficile infection (CDI). The donor stool is prepared and delivered by colonoscopy, nasojejunal tube, or encapsulated oral form. FMT restores microbiome diversity and colonisation resistance against C. difficile, achieving cure rates of 85 to 92 per cent in recurrent CDI.

What are short-chain fatty acids?

Short-chain fatty acids (SCFAs) are produced by gut bacteria fermentng dietary fibre. The primary SCFAs are butyrate, propionate, and acetate. Butyrate is the main energy source for colonocytes, maintains intestinal barrier integrity, regulates immune function, and has anti-inflammatory effects. Propionate and acetate are metabolised in the liver. Reduced SCFA production is associated with IBD, metabolic syndrome, and impaired colonisation resistance.

What is dysbiosis?

Dysbiosis is an imbalance or disruption of the microbial community composition, function, or metabolic activity compared to a healthy state. It can result from antibiotic use, dietary changes, disease, or infection. Dysbiosis is associated with conditions including IBD, obesity, C. difficile infection, and is increasingly being studied in the context of autoimmune disease, mental health, and cancer.

What is the resistome?

The resistome is the complete set of antibiotic resistance genes present in a microbial community or environment. Shotgun metagenomics of stool, soil, water, or other samples reveals the resistome, showing which resistance genes are circulating in that environment. Resistome analysis is used in AMR surveillance, antibiotic stewardship research, and environmental monitoring.

What is clinical metagenomics and when is it used?

Clinical metagenomics (or clinical metagenomic next-generation sequencing, mNGS) applies shotgun sequencing directly to clinical samples (CSF, BAL, blood, tissue) to detect pathogens that standard culture misses. It is particularly valuable for patients with culture-negative infections after standard workup, immunocompromised patients with unusual infections, and suspected novel or emerging pathogens. Commercial platforms offer CLIA-certified clinical mNGS services for specific sample types.