Metagenomic Analyses of an Uncultured Viral Community from Human Feces (original) (raw)

Abstract

Here we present the first metagenomic analyses of an uncultured viral community from human feces, using partial shotgun sequencing. Most of the sequences were unrelated to anything previously reported. The recognizable viruses were mostly siphophages, and the community contained an estimated 1,200 viral genotypes.


The human gut is colonized by an abundant, active, and diverse microbiota. This microbiota has been studied extensively using culture-based assays and, more recently, by a variety of molecular methods including fluorescent in situ hybridization, terminal restriction fragment length polymorphism, membrane assays, microarrays, and direct sequencing of 16S libraries (15, 18, 22, 25, 32, 35). These studies have shown that there are 400 to 500 human intestinal microbial species, with 30 to 40 species accounting for 99% of the total population (9, 15, 23, 24, 35).

Bacteriophages likely exert a strong influence on the diversity and population structure of bacterial communities in the human gut. Phages that infect Escherichia coli, Salmonella spp., and Bacteroides fragilis have been isolated from human fecal samples at concentrations ranging from 0 to 105 phages per g of dry feces (4, 7, 11-14, 16, 17, 19, 26, 28, 33). The presence and concentration of these phages differed among individuals and did not correlate with age or sex (12). Studies have also demonstrated prophage induction from human fecal bacteria upon treatment with DNA-damaging agents (5, 20). However, direct counts of bacteriophages in human feces have not been performed, nor have molecular methods been applied to study the phage populations. Here we present the first metagenomic analyses of the composition and population structure of an uncultured viral community from human feces.

Approximately 500 g of freshly voided fecal matter was collected from a 33-year-old healthy male. The fecal matter was resuspended in 5 liters of phosphate-buffered saline and shaken vigorously to dislodge the viral particles from the feces. The supernatant was poured through a Nitex filter (∼100-μm pore size) and then concentrated using a 100-kDa tangential flow filter. The concentrate was loaded onto a cesium chloride step gradient and ultracentrifuged, and the 1.35- to 1.5-g ml−1 fraction was collected. A portion of the viral concentrate was stained with 1× SYBR Gold (Molecular Probes) for 10 min and visualized by using epifluorescent microscopy (Fig. 1). When stained with SYBR Gold, prokaryotic cells are extremely bright and have visible morphologies. Prokaryotic cells are easily distinguishable from viruses, which are distinct pinpricks of light. As shown in Fig. 1, the purified viral concentrate was not contaminated with any microbial cells.

FIG. 1.

FIG. 1.

SYBR Gold-stained human fecal viral concentrate under epifluorescent microscopy. No contaminating microbial cells were observed.

DNA was extracted from the viral concentrate by using formamide and cetyltrimethylammonium bromide extractions (31). A linker-amplified shotgun library was then created from the human fecal viral DNA as described previously (3; M. Breitbart, B. Felts, S. Kelley, J. M. Mahaffy, J. Nulton, P. Salamon, and F. Rohwer, submitted for publication; www.sci.sdsu.edu/PHAGE/LASL/index.htm). Briefly, the total viral community DNA was randomly sheared (HydroShear; GenMachine, San Carlos, Calif.) and end repaired and double-stranded DNA linkers were ligated to the ends. The fragments were amplified using Vent DNA polymerase, ligated into the pSMART vector (Lucigen, Middleton, Wis.), and electroporated into MC12 cells. This method overcomes limitations of modified nucleotides and bactericidal genes in viral genomes.

With the use of the AmpL2 forward primer (Lucigen), 532 clones from the library were sequenced (accession no. CC820769 to CC821300). TBLASTX comparison of these sequences against those in GenBank revealed that the majority (59%) of the sequences were not significantly similar (E value of <0.001) to anything previously reported (1, 2) (Fig. 2A). Sequences with significant hits were classified as phages, viruses, mobile elements, repeat elements, Bacteria, Archaea, or Eucarya based on GenBank annotation. Bacterial, archaeal, and eukaryotic hits were examined manually to identify repeat elements and potential prophages.

FIG. 2.

FIG. 2.

Genomic overview of the uncultured viral community from human feces based on TBLASTX sequence similarities. (A) Numbers of sequences with significant matches (E values of <0.001) in GenBank. (B) Distribution of significant matches among major classes of biological entities. (C) Types of mobile elements recognized in the library. (D) Families of phages identified in the fecal library.

The most common known matches were with bacteria. We do not believe that the presence of these sequences was due to contamination. Cesium chloride purification efficiently separates the viral particles from the prokaryotic cells and free DNA. As shown in Fig. 1, no contaminating microbial cells were present in the viral concentrate. Additionally, the open reading frames in purified cultured phage genomes are often more similar to bacterial open reading frames than to those of other phages (see, e.g., references 27 and 30). For several clones from a previously sequenced marine viral library which were sequenced from both ends, the match with one end of the clone was with a phage while the other end of the same clone had a significant match with a bacterial sequence (3). These bacterial hits may also represent uncharacterized prophages or sequences from transducing phages.

Significant matches with phages were the second most abundant category in the uncultured fecal library (Fig. 2B). Among the phage matches, the majority (81%) were with siphophages and prophages within bacterial genomes. Since many of these phages have the ability to be temperate, this suggests that temperate lifestyles may be important in the human colon. The most common phage matches were with bacteriophage A118 of Listeria monocytogenes, bacteriophage E125 of Burkholderia thailandensis, and bacteriophage bIL285 of Lactococcus lactis. Fifty-three percent of the phage matches were with known proteins, with structural proteins and terminases being the most common (Table 1). Several matches with mobile elements (plasmids, transposons, and insertion sequences) were also observed in the library (Fig. 2C).

TABLE 1.

Categories of phage proteins with significant matches in the uncultured human fecal viral library

Protein type No. of matches
Unknown 25
Structural protein 8
Terminase 8
Portal protein 3
Protease 3
Antirepressor 1
Endolysin 1
Endopeptidase 1
Polyketide synthase 1
Repressor 1
Tape measure protein 1

In order to determine the genome size distribution of uncultured bacteriophages in the human fecal sample, a fraction of the viral concentrate was examined using pulsed-field gel electrophoresis as described previously (10). Major bands for the fecal phage population were observed at 15 and 90 kb, with minor bands present at 30, 40, and 60 kb (Fig. 3). This genome size distribution was significantly different than that observed in other environments (e.g., seawater, sediment, and rumen) (10, 21, 36). Especially notable was the presence of a dominant band at a small genome size of approximately 15 kb.

FIG. 3.

FIG. 3.

Pulsed-field gel displaying the major genome size classes of viruses in the human fecal sample. The three sample lanes (on the right) represent increasing amounts of sample DNA. The DNA ladder is shown in the leftmost lane. The limited amount of DNA made it necessary to enhance the sample bands relative to the ladder by using Corel PHOTO-PAINT. The reported bands were visible by eye on the original gel.

The structure of the uncultured fecal viral population was determined by running Monte Carlo simulations to match the distribution of overlapping sequences (i.e., the contig spectrum) observed in the library (3; Breitbart et al., submitted). Among the 532 sequences from the fecal viral library, there were 18 contiguous sequences made up of two sequences (2-contigs), two 3-contigs, and two 4-contigs, as well as 482 sequences that did not overlap with any other sequences (1-contig). Based on the normalized band intensities of the pulsed-field gel, an average genome size of 30 kb was assumed for the population modeling. The population structure of the uncultured fecal viral community was assumed to follow a power law distribution (ni = ai−b, where ni is the relative frequency of the _i_th genotype, a is relative abundance of the most abundant genotype, i is the rank index from 1 to the total number of genotypes, and b is the evenness parameter) based on previous results (3). Using these assumptions, and an average fragment size of 699 bp, we calculated that the viral community contained ∼1,200 genotypes. The most abundant virus made up ∼4% of the total (Table 2). The nonparametric estimator Chao1 predicted 162 genotypes for this population (3, 6). Effects of varying the average genome size on the model predictions are shown in Table 2. Based on previous estimates of gut microbial diversity, there are approximately two to five times as many viral genotypes as the number of bacterial species in the human intestinal microbiota. Diversity of the fecal viral community was high, with a Shannon index (_H_nats) of 6.4 nats. This is a higher Shannon index than that observed for most microbial communities, but it is lower than the Shannon index found for uncultured viral communities in seawater (_H_nats = ∼7) and sediment (_H_nats > 9) samples (3; Breitbart et al., submitted).

TABLE 2.

Population structure of the uncultured fecal viral community as determined by mathematical modelinga

Assumed avg genome size (kb) Total no. of viral genotypes % Abundance of most abundant virus (a × 100) Evenness parameter (b)b Shannon index (_H_nats)
15 1,450 ± 700 2.4 ± 1.2 0.611 ± 0.110 6.83
30 1,250 ± 230 4.2 ± 0.7 0.715 ± 0.041 6.45
50 1,930 ± 470 6.3 ± 0.8 0.831 ± 0.029 6.43

In all the uncultured viral communities examined so far (3; Breitbart et al., submitted), the majority of the sequences have been novel. Similarly, culture-based studies of phage genomes have found that much, if not most, of phage diversity is unsampled (27, 29). Significant matches with each of the major groups of double-stranded DNA phages—siphophages, podophages, and myophages—have been observed in all uncultured viral communities studied. Siphophages and prophages were more abundant in the marine sediment (Breitbart et al., submitted) and fecal libraries than in the seawater libraries (3). The fecal viral community contained very few matches with the T7-like podophage and λ-like siphophage, which were the most abundant groups in the marine viral communities (Breitbart et al., submitted). In the fecal viral community, numerous similarities to phages that are known to infect gram-positive bacteria were seen. This can be explained by the fact that over 62% of the cells in human feces that can be detected with a bacterium-specific probe belong to gram-positive groups (15).

Understanding the population structure and dynamics of the normal human intestinal microbiota has important implications for human health, nutrition, and the development of probiotics for the treatment and prevention of gastrointestinal disorders. Since phages found in human feces are used as indicators of sewage-derived contamination of the environment, it is also important to know the identity of these phages (12). Phages likely influence the composition of bacterial populations in the intestine through specific predation on microbial hosts. As a particular strain becomes dominant, phages can infect and lyse that host, allowing another bacterial strain, either of endogenous or exogenous origin, the opportunity to become abundant (34). Additionally, through lysogenic conversion of resident intestinal bacteria, phages may introduce new phenotypic traits, such as antibiotic resistance and the ability to produce exotoxins (8).

Phages are a diverse and largely unexplored component of the microbial community in human feces. Using culture-independent methods, we have described the population structure and genome size distribution of a human fecal phage community. In addition, we have elucidated the identity of some of these uncultured phages based on sequence similarities. Future studies need to focus on determining the ecological roles of phages in the human intestine and the degree of similarity of the phage communities populating different individuals.

Acknowledgments

We thank Anca Segall, Beltran Rodriguez Brito, David Bangor, and Linda Wegley for helpful comments on the mathematical modeling and manuscript, as well as the San Diego State University Microchemical Core Facility for sequencing.

This work was supported by NSF DEB 03-16518 and DEB-BE0221763.

REFERENCES