Session 3: Bioinformatics Challenges in Metagenomics

“Microbial Communities in an Urban Mass Transit System” – Curtis Huttenhower, Harvard School of Public Health, USA

Public transit represents an important hub of human-to-surface microbial interactions, as large numbers of people share contact with surfaces within subway cars and stations. Although public perceptions of microbes found on public spaces are largely negative, neither ecological nor functional surveys have yet comprehensively characterized these microbial communities. We have profiled microbial communities across the Boston metropolitan transit system to assess the sources of urban transit microbes, their relationship with the human microbiome, their functional profiles (including potential virulence factors), and their metabolic state. Microbial communities were sampled from train cars, stations, and ticket vending kiosks and analyzed using amplicon and shotgun sequencing. The largest correlate with ecological variation arose from different surfaces within transit structures, such as hanging grips, seats, and walls. In contrast, the train line or location from which samples were collected had minimal effect. With in train stations, touch screens varied mostly between those collected from indoor versus outdoor locations. Comparative analysis with human and environmental data revealed that these communities resemble human commensal skin and oral microbial sources, though environmental sources were enriched among outdoor touch screen machines. The microbial communities of these surfaces typically resembled an aggregate of the human body or environmental sites to which they were most exposed (e.g., respiratory microbes on surfaces close to human breathing space, skin-resident microbes on touched surfaces). This suggests that the microbial communities on transit surfaces are highly influenced by the nature, frequency, and diversity of their interaction with the human body and environment.

“MetaPhlAn v2 and tracking Microbes at the Strain Level” – Edoardo Pasolli, University of Trento, Italy

MetaPhlAn is a computational method developed to taxonomically profile shotgun metagenomes. It uses clade-specific marker genes to unambiguously assign reads to microbial clades with high accuracy and computational performances. In this talk, we present some recent improvements that make MetaPhlAn version 2 able to identify and track microbes at the strain level and to profile all domains of life simultaneously (Bacteria, Archaea, Eukaryotes, and Viruses). These and other additions make MetaPhlAn2 and our related tools able to perform strain level population genomics and epidemiology analysis from metagenomes directly. We present some results of our methodology when applied on a large set of human microbiomes, and we discuss its potentialities for environmental metagenomes and subway samples in particular.

“One Codex: A Data Platform for Highly Accurate, Reproducible Metagenomics” – Nick Greenfield, One Codex, USA

Next-generation sequencing (NGS) is rapidly undergoing a transition from a research to an applied technology. With this shift, there is a pronounced need for software that can scale to handle massive volumes of data and yet remain approachable for the applied end-user. This talk will cover One Codex’s platform for microbial genomics, and offer a brief tour of: 1) the platform’s general features; 2) an overview of the platform’s metagenomic classification and strain-typing k-mer-based approach; and 3) how One Codex leverages software and technology industry best practices for more scalable, repeatable, and reproducible bioinformatics.

“GENIUS: High Performance Bioinformatics Tools and Curated Genome Databases for Rapid Profiling of Shotgun Metagenomes” – Nur Hasan, CosmosID, USA

Whole genome shotgun (WGS) sequencing has increasingly been recognized as the method of choice for unbiased, comprehensive, and high- resolution characterization of microbial communities. Recent advances in next-generation sequencing (NGS) technologies have substantially have reduced costs and production time, and, simultaneously, significantly increased the throughput and complexity of the data. Translating such huge amounts of complex NGS data into actionable information is becoming a rate- limiting step in the wide adoption of WGS metagenomics. In this talk, GENIUS, a solution developed by CosmosID for rapid, accurate, and actionable microbial identification using unassembled short reads derived from any existing sequencing platform will be presented. The presentation will describe various aspects of GENIUS: ease and operational friendliness, data mining efficiency and processing speed, breadth and depth of curated genome databases to deliver actionable results, and flexibility of the databases to allow user incorporations of specific metadata and/or genome sequences. The presentation will further demonstrate some valuable features that make this technology very attractive and powerful for vast areas of application.

“The Grand Challenge in Metagenomics: Sensitive and Accurate Taxa Classification” – Ebrahim Afshinnekoo, Weill Cornell Medicine, USA

After the recent revolution in next generation sequencing technology there has been an explosion of microbiome and metagenomic studies based on sequencing data. Most notably, the Human Microbiome Project, the Earth Microbiome Project, the Extreme Microbiome Project, and recently, the PathoMap study. These studies have produced massive datasets with the potential to change our perception of the world around us. The applications of these studies range from pathogen monitoring in the clinical realm, to informing the way we build and design our cities. For any of these applications, the baseline information is a characterization of the taxa present in a given sample, and their relative concentration. This is non-trivial, and there are several computational tools that aim to identify the organisms present in a sample by comparing the short reads to the assembled genomic sequences of known organisms, but each implements this differently, varying in algorithms or databases used. Here we present the results of a side-by-side comparison of the principal metagenomic analysis software packages includingMetaPhlAn, BLAST, MG-RAST, SURPI, Kraken, Phylosift, GOTTCHA, and GENIUS, benchmarked on multiple datasets to compare their sensitivity and specificity. All tools were run first on a series of positive control datasets of know bacterial populations, including samples from the ABRF and NIAID, among others. We find that each method has its own unique strengths and weaknesses, and suggest that developing an integrative bioinformatics pipeline that incorporates these different tools is the most effective analytical approach to interpreting metagenomic data in future studies.

“Small Molecule Biosynthesis in Complex Microbial Communities” – Mohamed Donia, Princeton University, USA

In complex biological systems, small molecules often mediate microbe-microbe and microbe-host interactions. Identifying these molecules, characterizing their biological activity, and explaining their relevance in a native context are challenging endeavors. Here, I will describe our ongoing efforts to develop computational and experimental tools for discovering small molecules encoded and produced in complex host- associated microbiomes. Examples from several systems will be discussed, including human and marine invertebrate micro biomes.