Sequencing - MetaSUB

All reads will be quality trimmed with the FASTX toolkit (http://hannonlab. cshl.edu/fastx_toolkit/) to ensure 99% base-level accuracy (Q20). Cleaned reads will then be aligned with MegaBLAST (Wolfsberg and Madden, 2001) to search for a match to any organism in the full NCBI NT/NR database. The MegaBLAST output for one read often returns multiple hits to sequences from different taxa, so we will assign each read to a single ‘‘best’’ taxon using the LCA algorithm established by MEGAN (Huson et al., 2007). For example, the species Salmonella enterica and the species Salmonella bongori may have ambiguous reads that match both species, but the LCA (genus Salmonella) can have sequences unique to that genus, which is then the assigned taxa. To further classify bacterial and viral sequences, we will also analyze all samples with MetaPhlAn 2.0 (Segata et al., 2012), and for specific pathogens, we will also use SURPI (Naccache et al., 2014) and the BWA (Li and Durbin, 2010). MetaPhlAn version (v2.0) will be used to study the microbial populations on the subway surfaces. FASTQ files from sequencing will be run through MetaPhlAn, and the output file (.bt2.out) will outline the abundance of various bacterial organisms to the species level.

Shotgun Metagenomic Sequencing