Increased rectal microbial richness is associated with the presence of colorectal adenomas in humans (original) (raw)

Journal Article

Nina Sanapareddy ,

Department of Bioinformatics and Genomics, University of North Carolina

Charlotte, NC

USA

Search for other works by this author on:

Ryan M Legge ,

Department of Food Science and Technology, University of Nebraska

Lincoln, NE

USA

Search for other works by this author on:

Biljana Jovov ,

Center for Gastrointestinal Biology and Disease, University of North Carolina at Chapel Hill

Chapel Hill, NC

USA

Search for other works by this author on:

Amber McCoy ,

Center for Gastrointestinal Biology and Disease, University of North Carolina at Chapel Hill

Chapel Hill, NC

USA

Search for other works by this author on:

Lauren Burcal ,

Center for Gastrointestinal Biology and Disease, University of North Carolina at Chapel Hill

Chapel Hill, NC

USA

Search for other works by this author on:

Felix Araujo-Perez ,

Center for Gastrointestinal Biology and Disease, University of North Carolina at Chapel Hill

Chapel Hill, NC

USA

Search for other works by this author on:

Thomas A Randall ,

Center for Bioinformatics, University of North Carolina at Chapel Hill

Chapel Hill, NC

USA

Search for other works by this author on:

Joseph Galanko ,

Center for Gastrointestinal Biology and Disease, University of North Carolina at Chapel Hill

Chapel Hill, NC

USA

Search for other works by this author on:

Andrew Benson ,

Department of Food Science and Technology, University of Nebraska

Lincoln, NE

USA

Search for other works by this author on:

Robert S Sandler

Center for Gastrointestinal Biology and Disease, University of North Carolina at Chapel Hill

Chapel Hill, NC

USA

Search for other works by this author on:

Received:

07 November 2011

Revision received:

21 February 2012

Cite

Nina Sanapareddy, Ryan M Legge, Biljana Jovov, Amber McCoy, Lauren Burcal, Felix Araujo-Perez, Thomas A Randall, Joseph Galanko, Andrew Benson, Robert S Sandler, John F Rawls, Zaid Abdo, Anthony A Fodor, Temitope O Keku, Increased rectal microbial richness is associated with the presence of colorectal adenomas in humans, The ISME Journal, Volume 6, Issue 10, October 2012, Pages 1858–1868, https://doi.org/10.1038/ismej.2012.43
Close

Navbar Search Filter Mobile Enter search term Search

Abstract

Differences in the composition of the gut microbial community have been associated with diseases such as obesity, Crohn’s disease, ulcerative colitis and colorectal cancer (CRC). We used 454 titanium pyrosequencing of the V1–V2 region of the 16S rRNA gene to characterize adherent bacterial communities in mucosal biopsy samples from 33 subjects with adenomas and 38 subjects without adenomas (controls). Biopsy samples from subjects with adenomas had greater numbers of bacteria from 87 taxa than controls; only 5 taxa were more abundant in control samples. The magnitude of the differences in the distal gut microbiota between patients with adenomas and controls was more pronounced than that of any other clinical parameters including obesity, diet or family history of CRC. This suggests that sequence analysis of the microbiota could be used to identify patients at risk for developing adenomas.

Introduction

Microbes that are associated with the human body outnumber our own ‘human’ cells by a factor of 10 (Savage, 1977) and provide us with a wide array of vital metabolic functions (Gill et al., 2006; Willing et al., 2009). Recent research suggests that disruption of the human microbiome may play a crucial role in diabetes (Burcelin et al., 2009), skin diseases (Grice et al., 2008), obesity (Backhed et al., 2004; Ley et al., 2005, 2006; Turnbaugh et al., 2006; Cani and Delzenne, 2009; Tsukumo et al., 2009; Turnbaugh and Gordon, 2009) and a range of ‘immuno-pathologic’ conditions including inflammatory bowel diseases (Moore and Moore, 1995; Powrie and Uhlig, 2004; Rakoff-Nahoum et al., 2004; Rakoff-Nahoum and Medzhitov, 2006). Colorectal cancer (CRC) is a prevalent malignancy within the western countries and is the second leading cause of cancer death in the United States (Jemal et al., 2010). The majority (∼90%) of CRC cases arise sporadically from benign adenomatous polyps (Lance, 1997). There is significant variation in the risk of developing CRC between and within populations and geographical regions (Lance, 1997). Although age, tobacco and alcohol consumption, lack of physical activity and increased body weight are considered important risk factors for CRC (Moore and Moore, 1995), the most significant risk factor appears to be diet (Bingham, 2000).

The role of host-associated microbiota (Hope et al., 2005) has also been frequently proposed as a critical factor in CRC development (Huycke and Gaskins, 2004; Scanlan et al., 2008). Recent technological breakthroughs now allow for the study of the human-associated microbiome at a level of detail that was unimaginable only a few years ago (Margulies et al., 2005; Petrosino et al., 2009). Initial examinations of the human-associated microbial community with next-generation sequencing have discovered enormous inter-personal variation in the microbiomes of healthy individuals (Costello et al., 2009). In this study, we used high-throughput pyrosequencing approaches to ask how the distal gut microbiome varies between individuals who have colorectal adenomas compared with a control group without adenomas. Our results suggest that, despite the tremendous differences between individuals in their associated microbiomes, there is a consistent signature across subjects associated with colorectal adenomas.

Materials and methods

Study participants, colonoscopy and biopsy procedures

To evaluate associations between the gut microbiota and the presence of adenomas anywhere in the colon, we collected biopsies from normal rectal mucosa ∼10–12 cm regions from the anal verge from 33 adenoma subjects and 38 adenoma-free controls. Participants in the study were randomly selected from the Diet and Health Study V, which included persons age 30 years or older who underwent colonoscopy for screening purposes at the University of North Carolina Hospitals. Eligible subjects gave written informed consent to provide colorectal biopsies and a phone interview that asked questions about diet and lifestyle. Information on diet was obtained from a comprehensive, validated, quantitative food frequency questionnaire developed at the National Cancer Institute. A lifestyle questionnaire collected data about demographics, medical history, physical activity, medications and other exposures factors from eligible participants. At the time of the colonoscopy procedure, the research assistant obtained anthropometric measures in order to determine body mass index (BMI) and waist–hip ratio (WHR).

Subjects were in generally good health when they presented for screening. All patients received standard instructions for preparation for colonoscopy that included consumption of 4 l of polyethylene glycol for bowel cleansing. Inclusion criteria included visualization of the entire colon with complete colonoscopy and a clean colon to avoid the misclassification of cases and controls. Exclusion criteria included colitis (either ulcerative, Crohn’s, radiation or infectious colitis, chronic inflammatory illnesses), previous colonic or small bowel resection, previous colon adenomas or colon cancer, sigmoidoscopy or incomplete colonoscopies, familial polyposis syndrome. The enrollment procedure as well as colonoscopy and biopsy procedures are similar to previously described protocols (Keku et al., 2005; Shen et al., 2010).

Subjects were asked if they used antibiotics in the last 3 months before colonoscopy. In all, 33 subjects (11 controls and 22 cases) answered the question. Among these subjects, one case subject (4.55%) and one control subject (9.09%) reported antibiotics use. We have no information about the specific antibiotics they took or why they took them. None of the patient was on antibiotics at the time of the study.

A study pathologist examined all pathologic specimens to confirm adenoma case status and recorded the number of polyps, size, location and histology. Subjects with confirmed adenomatous polyps were classified as cases and those without adenomas as controls. In order to avoid disturbing the mucosa as much as possible, rectal mucosal biopsies were collected immediately after inserting the scope ∼10–12 cm from the anal verge for each patient. The biopsies collected from all subjects were from ‘healthy mucosa’ and not from adenomas. Normal mucosal biopsies were rinsed in sterile phosphate-buffered saline to ensure no contamination with fecal matter and snap-frozen in liquid nitrogen on site. The frozen biopsies were later transferred to −80 °C until DNA extraction. The normal rectal biopsies from the study subjects did not show any histology suggesting inflammation. The study was approved by the Institutional Review Board at the University of North Carolina, School of Medicine (Protocol #05-3138).

DNA extraction and sequencing

Bacterial genomic DNA was extracted from mucosal biopsies. The biopsies ranged in weight between 10–20 mg. Two biopsies per subject were used for bacterial DNA extraction and these were placed in lysozyme (30 mg ml−1; Sigma, St Louis, MO, USA) for 30 min. The biopsy–lysozyme mixture was homogenized on a bead beater (Biospec Products Inc., Bartlesville, OK, USA) at 4800 r.p.m. for 3 min at room temperature followed by DNA extraction using the Qiagen DNA isolation kit (Qiagen, Valencia, CA, USA, cat # 14123) as per the manufacturer’s recommended protocol. The mucosal adherent microbiome was analyzed by Roche (Branford, CT, USA) 454 titanium pyrosequencing of 16S rRNA tags from genomic DNAs. Pyrosequencing (Margulies et al., 2005) was conducted at the University of Nebraska Lincoln Core for Applied Genomics and Ecology. We amplified the V1–V2 region (F8-R357) of the 16S rRNA gene from mucosal biopsies followed by titanium-based pyrosequence analyses. The 16S primers contained the Roche 454 Life Science’s A or B Titanium sequencing adapter (italicized), followed immediately by a unique 8-base barcode sequence (BBBBBBBB) and finally the 5′ end of primer A-8FM, 5′-_CCATCTCATCCCTGCGTGTCTCGACTCAG_BBBBBBBBAGAGTTTGATCMTGGCTCAG-3′ and B-357R, 5′-_CCTATCCCCTGTGTGCCTTGGCAGTCTCAG_BBBBBBBBCTGCTGCCTYCCGTA-3′. Each DNA sample was amplified with uniquely barcoded primers, which allowed us to mix polymerase chain reaction (PCR) products from many samples in a single run.

Data filtering

Sample filtering

We screened all the samples for a batch effect that correlated with the date of submission to the sequencing center. Samples were shipped on three separate dates from Chapel Hill to the sequencing center in Nebraska. Samples shipped on one particular date (30 September 2009) were found to cluster separately from samples shipped on other dates (10 June 2008 and 21 July 2008). The DNA stocks of these two groups of samples were also stored in different freezers at the Chapel Hill lab (Keku lab, UNC at Chapel Hill, NC, USA). In addition, the sum of Bacteroidetes and Firmicutes observed in samples shipped on this date was much lower than we would expect based on both previously published human gut microbial 454 data sets and our own 454 data sets. Sequences generated from samples sent to the sequencing center on this date were therefore removed from further analysis. Leek et al. (2010) recently showed the importance of screening high-throughput data sets for batch effects and screening for batch effects indeed proved useful in removing the technical artifacts from our data set. The characteristics of the 71 samples, selected after sample filtering, are shown in Table 1.

Table 1

Descriptive characteristics of the study participants, cases (33) and controls (38)

Characteristics	_Case (n_=33)	_Control (n_=38)	P_-value_a
Age (mean, s.e.)	57.45 (1.11)	55.70 (1.08)	0.26
Male (%)	60.61	50	0.54
WHR (mean, s.e.)	0.94 (0.01)	0.90 (0.01)	0.06
BMI (%)
Normal	27.27	48.65
Overweight	48.48	24.32	0.09
Obese	24.24	27.03
Caloric intake (in kcal) (mean, s.e.)	2053.78 (149.9)	2104.89 (252.46)	0.86
Diabetes history (yes, %)	6.25	6.45	0.97
Antibiotics use (%)a	4.55	9.09	0.61
Ever smoked (yes, %)	43.8	58.1	0.25
Family history of CRC (yes, %)	60	40	0.64
NSAIDs use (mean/month, s.e.)	14.9 (4.01)	13.55 (3.95)	0.81
Alcohol (g, mean, s.e.)	11.83 (2.24)	30.29 (17.03)	0.29
Red meat (mean, s.e.)	1.52 (0.21)	1.2 (0.18)	0.26
Total dietary fiber (g, mean, s.e.)	22.01 (1.64)	23.94 (2.34)	0.5
Fat (g, mean, s.e.)	79.13 (6.57)	73.8 (6.33)	0.56
Calcium (mg, mean, s.e.)	897.89 (71.88)	889.84 (70.65)	0.94
Vegetable servings (mean, s.e.)	4.49 (0.34)	5.17 (0.49)	0.26
Fruit servings (mean, s.e.)	3.02 (0.40)	2.89 (0.29)	0.8
Adenoma location (%)b
Proximal	42	—
Distal	42	—
Both	15.5	—
Adenoma size (%)b
1–5 mm (Small)	69.7	—
6–10 mm (Medium)	24.2	—
>10 mm (Large)	6.1	—
Number of adenomas (mean, range)	1.64 (1–9)

Characteristics	_Case (n_=33)	_Control (n_=38)	P_-value_a
Age (mean, s.e.)	57.45 (1.11)	55.70 (1.08)	0.26
Male (%)	60.61	50	0.54
WHR (mean, s.e.)	0.94 (0.01)	0.90 (0.01)	0.06
BMI (%)
Normal	27.27	48.65
Overweight	48.48	24.32	0.09
Obese	24.24	27.03
Caloric intake (in kcal) (mean, s.e.)	2053.78 (149.9)	2104.89 (252.46)	0.86
Diabetes history (yes, %)	6.25	6.45	0.97
Antibiotics use (%)a	4.55	9.09	0.61
Ever smoked (yes, %)	43.8	58.1	0.25
Family history of CRC (yes, %)	60	40	0.64
NSAIDs use (mean/month, s.e.)	14.9 (4.01)	13.55 (3.95)	0.81
Alcohol (g, mean, s.e.)	11.83 (2.24)	30.29 (17.03)	0.29
Red meat (mean, s.e.)	1.52 (0.21)	1.2 (0.18)	0.26
Total dietary fiber (g, mean, s.e.)	22.01 (1.64)	23.94 (2.34)	0.5
Fat (g, mean, s.e.)	79.13 (6.57)	73.8 (6.33)	0.56
Calcium (mg, mean, s.e.)	897.89 (71.88)	889.84 (70.65)	0.94
Vegetable servings (mean, s.e.)	4.49 (0.34)	5.17 (0.49)	0.26
Fruit servings (mean, s.e.)	3.02 (0.40)	2.89 (0.29)	0.8
Adenoma location (%)b
Proximal	42	—
Distal	42	—
Both	15.5	—
Adenoma size (%)b
1–5 mm (Small)	69.7	—
6–10 mm (Medium)	24.2	—
>10 mm (Large)	6.1	—
Number of adenomas (mean, range)	1.64 (1–9)

Abbreviations: BMI, body mass index; CRC, colorectal cancer; NSAID, non-steroidal anti-inflammatory drug; WHR, waist-to-hip ratio.

_P_-values are based on _t_-tests between case and control or the χ2 test for BMI category. Caloric intake is reported as kilocalories (kcal) and is based on responses from a food frequency questionnaire50 that was administered to subjects during phone interviews.

Antibiotic use, percent of subjects that reported antibiotics use 3 months before colonoscopy.

Adenomas located in the cecum, ascending colon, hepatic flexure and transverse colon were considered proximal location while adenomas located in the splenic flexure, descending colon, sigmoid, rectosigmoid and rectum were considered distal. Subjects that had had adenomas in both locations were classified as both.

Table 1

Descriptive characteristics of the study participants, cases (33) and controls (38)

Characteristics	_Case (n_=33)	_Control (n_=38)	P_-value_a
Age (mean, s.e.)	57.45 (1.11)	55.70 (1.08)	0.26
Male (%)	60.61	50	0.54
WHR (mean, s.e.)	0.94 (0.01)	0.90 (0.01)	0.06
BMI (%)
Normal	27.27	48.65
Overweight	48.48	24.32	0.09
Obese	24.24	27.03
Caloric intake (in kcal) (mean, s.e.)	2053.78 (149.9)	2104.89 (252.46)	0.86
Diabetes history (yes, %)	6.25	6.45	0.97
Antibiotics use (%)a	4.55	9.09	0.61
Ever smoked (yes, %)	43.8	58.1	0.25
Family history of CRC (yes, %)	60	40	0.64
NSAIDs use (mean/month, s.e.)	14.9 (4.01)	13.55 (3.95)	0.81
Alcohol (g, mean, s.e.)	11.83 (2.24)	30.29 (17.03)	0.29
Red meat (mean, s.e.)	1.52 (0.21)	1.2 (0.18)	0.26
Total dietary fiber (g, mean, s.e.)	22.01 (1.64)	23.94 (2.34)	0.5
Fat (g, mean, s.e.)	79.13 (6.57)	73.8 (6.33)	0.56
Calcium (mg, mean, s.e.)	897.89 (71.88)	889.84 (70.65)	0.94
Vegetable servings (mean, s.e.)	4.49 (0.34)	5.17 (0.49)	0.26
Fruit servings (mean, s.e.)	3.02 (0.40)	2.89 (0.29)	0.8
Adenoma location (%)b
Proximal	42	—
Distal	42	—
Both	15.5	—
Adenoma size (%)b
1–5 mm (Small)	69.7	—
6–10 mm (Medium)	24.2	—
>10 mm (Large)	6.1	—
Number of adenomas (mean, range)	1.64 (1–9)

Characteristics	_Case (n_=33)	_Control (n_=38)	P_-value_a
Age (mean, s.e.)	57.45 (1.11)	55.70 (1.08)	0.26
Male (%)	60.61	50	0.54
WHR (mean, s.e.)	0.94 (0.01)	0.90 (0.01)	0.06
BMI (%)
Normal	27.27	48.65
Overweight	48.48	24.32	0.09
Obese	24.24	27.03
Caloric intake (in kcal) (mean, s.e.)	2053.78 (149.9)	2104.89 (252.46)	0.86
Diabetes history (yes, %)	6.25	6.45	0.97
Antibiotics use (%)a	4.55	9.09	0.61
Ever smoked (yes, %)	43.8	58.1	0.25
Family history of CRC (yes, %)	60	40	0.64
NSAIDs use (mean/month, s.e.)	14.9 (4.01)	13.55 (3.95)	0.81
Alcohol (g, mean, s.e.)	11.83 (2.24)	30.29 (17.03)	0.29
Red meat (mean, s.e.)	1.52 (0.21)	1.2 (0.18)	0.26
Total dietary fiber (g, mean, s.e.)	22.01 (1.64)	23.94 (2.34)	0.5
Fat (g, mean, s.e.)	79.13 (6.57)	73.8 (6.33)	0.56
Calcium (mg, mean, s.e.)	897.89 (71.88)	889.84 (70.65)	0.94
Vegetable servings (mean, s.e.)	4.49 (0.34)	5.17 (0.49)	0.26
Fruit servings (mean, s.e.)	3.02 (0.40)	2.89 (0.29)	0.8
Adenoma location (%)b
Proximal	42	—
Distal	42	—
Both	15.5	—
Adenoma size (%)b
1–5 mm (Small)	69.7	—
6–10 mm (Medium)	24.2	—
>10 mm (Large)	6.1	—
Number of adenomas (mean, range)	1.64 (1–9)

Abbreviations: BMI, body mass index; CRC, colorectal cancer; NSAID, non-steroidal anti-inflammatory drug; WHR, waist-to-hip ratio.

Antibiotic use, percent of subjects that reported antibiotics use 3 months before colonoscopy.

Sequence filtering

Ribosomal Database Project (RDP) Pipeline: The first step in the data analysis process involved a preliminary QC (quality control) filter (downstream of the Roche 454 GS-FLX software filtering). We removed sequences from our data set if there were any _N_s in the sequence or the 5′ primer did not exactly match the expected 5′ primer or if the average quality score was <20. We then removed the 5′ primer sequence from our reads that have survived above filtering. Only trimmed filtered sequences with a length between 200–500 bp were kept in our data set for RDP analysis.

Operational Taxonomic Unit (OTU) Pipeline: We removed sequences from inclusion in the OTU data set if there were any _N_s in the trimmed sequence or if the 5′ primer did not exactly match the expected 5′ primer. As recommended by Kunin et al. (2010), sequences were end-trimmed with the Lucy algorithm (Chou and Holmes, 2001) at a threshold of 0.002 (quality score of 27). Only reads with trimmed lengths between 150 and 450 were retained for OTU analysis. Table 2 shows the number of sequences removed by our RDP and OTU pipelines.

Table 2

454 Data set characteristics before and after QC for RDP and OTU pipelines

Original	After QC
RDP pipeline
Total no. of sequences	600354	598645
Average/sample	8455.69	8431.62
s.d.	3840.73	3843.29
Average sequence length	343.131	343.575
OTU pipeline
Total no. of sequences	600354	532506
Average/sample	8455.69	7500.08
s.d.	3840.73	3578.55
Average sequence length	343.131	302.034

Original	After QC
RDP pipeline
Total no. of sequences	600354	598645
Average/sample	8455.69	8431.62
s.d.	3840.73	3843.29
Average sequence length	343.131	343.575
OTU pipeline
Total no. of sequences	600354	532506
Average/sample	8455.69	7500.08
s.d.	3840.73	3578.55
Average sequence length	343.131	302.034

Abbreviations: QC, quality control; RDP, Ribosomal Database Project.

Table 2

454 Data set characteristics before and after QC for RDP and OTU pipelines

Original	After QC
RDP pipeline
Total no. of sequences	600354	598645
Average/sample	8455.69	8431.62
s.d.	3840.73	3843.29
Average sequence length	343.131	343.575
OTU pipeline
Total no. of sequences	600354	532506
Average/sample	8455.69	7500.08
s.d.	3840.73	3578.55
Average sequence length	343.131	302.034

Original	After QC
RDP pipeline
Total no. of sequences	600354	598645
Average/sample	8455.69	8431.62
s.d.	3840.73	3843.29
Average sequence length	343.131	343.575
OTU pipeline
Total no. of sequences	600354	532506
Average/sample	8455.69	7500.08
s.d.	3840.73	3578.55
Average sequence length	343.131	302.034

Abbreviations: QC, quality control; RDP, Ribosomal Database Project.

Bacterial identification

The sequences in our data set were given taxonomic assignments based on two methods.

RDP assignment method: Sequences that have been filtered using the RDP pipeline (Table 2) were submitted to the RDP Classifier 2.1 algorithm for taxonomic identification at various taxonomic levels. Sequences assigned in each sample to various taxa, from phylum level and genus level, were counted at the RDP confidence threshold of 80.

OTU assignment method: OTU analysis is more sensitive to sequencing error (Kunin et al., 2010) and we therefore applied additional QC steps in our OTU analysis pipeline (Table 2). Sequences filtered through the OTU pipeline were submitted to AbundantOTU (http://omics.informatics.indiana.edu/AbundantOTU/) for assignment of each sequence to OTUs (97% identity). Sequences assigned in each sample to various OTUs were counted and then normalized and log transformed (see Data Preprocessing), before proceeding to further downstream analyses. Consensus sequences generated by AbundantOTU during construction of OTUs were submitted to RDP classifier 2.1 to assign taxonomy to each of the OTU groups. Consensus sequences of the 613 OTUs generated by AbundantOTU (Supplementary File 3.txt) were also submitted to ChimeraSlayer (Haas et al., 2011) (http://microbiomeutil.sourceforge.net/) and the nine consensus OTUs identified by chimera slayer as chimeras were removed from our data set. In addition, consensus sequences of four OTUs on BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi) search against the Silva reference 16S database failed to match >97% sequence identity, so these were also removed from further analysis. This left a total of 600 OTUs.

Richness and evenness

Shannon–Wiener Diversity Index, H, was calculated using the equation, _H_=−ΣPi (lnPi), where Pi is the proportion of each species (taxa) in the sample. Richness was calculated as the number of OTUs, genera or phyla observed in 2636 sequences (where 2636 is the number of sequences seen in the sample with the fewest sequences). For each sample, 2636 sequences were randomly chosen 1000 times and the average number of OTUs, genera or phyla observed over these 1000 permutations was reported as richness.

Evenness measures how evenly the individuals are distributed among the different species/taxa and is calculated by _J_= _H_’/Log (S) where _H_’ is Shannon diversity and S is the number of species or taxa in each sample. Wilcoxon-tests and Student’s _t_-tests were performed to compare the mean similarities of the groups, case and control. The false discovery rate (FDR) was set at 10% using the Benjamini and Hochberg procedure (Benjamini and Hochberg, 1995) to avoid type 1 error due to multiple comparisons on a single data set.

Data preprocessing

Raw counts were normalized then log transformed using the normalization scheme mentioned below, before proceeding with the rest of the analyses.

LOG10 ((Raw count/number of sequences in that sample) × Average number of sequences per sample +1).

Removal of rare taxa

In order to minimize the number of null hypotheses for which we need to correct for multiple hypothesis testing, we needed to remove rarely occurring taxa that occurred in so few patients that they could not be significantly associated with case–control or obesity phenotypes. In all of our analyses (except richness calculations), we therefore only included taxa that occurred in at least 25% of all samples. For the RDP approach, 9 phyla and 100 genera met this criterion. For the OTU approach, 371 OTUs met this criterion.

Tree generation

For each of the 371 consensus sequences from OTUs that met the above criteria, BLASTN (http://blast.ncbi.nlm.nih.gov/Blast.cgi) was used to find the top 10 hits in the Silva reference tree release 104 (http://www.arb-silva.de/download/arb-files/). In this way, we identified a set of 3594 aligned sequences to serve as our reference tree. The program align.seqs within MOTHUR (http://www.mothur.org/) was used to align the 371 AbundantOTU consensus sequences that passed all QC steps to these 3594 aligned sequences as extracted from the Silva reference alignment. With custom Java code based on the Archaeopteryx code base (http://www.phylosoft.org/archaeopteryx/), we removed all but the 3594 sequences from the Silva reference tree. We then uploaded the alignment of the 3594 reference sequences plus the 371 AbundantOTU sequences to the RaxXML EPA server (http://i12k-exelixis3.informatik.tu-muenchen.de/raxml), which uses maximum likelihood to place new sequences within a reference tree. Each node in the tree was colored by FDR (Figure 3; Supplementary Figure 4). Trees were visualized with Archaeopteryx. Leaf nodes in Supplementary Figure 4 are labeled with the RDP call of the consensus sequence at 80%.

Data validation

Real-time quantitative PCR validation

qPCR primers were designed based on no <95% sequence similarity from bacterial 16S ribosomal DNA sequence alignments obtained from pyrosequencing. To measure the abundance of a specific taxon, three primer pairs where designed: one generic for all bacterial groups (Universal Primer): [EUB341-F 5′-CCTACGGGAGGCAGCAG-3′; EUB518-R 5′-ATTACCGCGGCTGCTGG-3′] and three taxon-specific primer pairs: first for the Helicobacter genus (Heli_F 5′-AGTGGCGCACGGGTGAGTA-3′; Heli_R 5′-GTGTCCGTTCACCCTCTCA-3′), the next one for the Acidovorax genus (Aci_F 5′-TGCTGACGAGTGGCGAAC-3′ Aci_R 5′-GTGGCTGGTCGTCCTCTC-3′) and another for the Cloacibacterium genus (Clo_F 5′-TGCGGAACACGTGTGCAA-3′; Clo_R 5′-CCGTTACCTCACCAACTAGC-3′).

In all, 10 μ l PCR reactions were prepared containing 100 ng of DNA extracted from colonic mucosal biopsies, 10 μ m of each primer and 5 μ l of Fast-SYBR Green Master Mix (Applied Biosystems, Carlsbad, CA, USA). Cycling conditions were: 1 cycle at 95 °C for 10 min followed by 45 cycles of 95 °C for 15 s, 60 °C for 1 min, and 72 °C for 30 s. A single dissociation curve cycle was run as follows: 95 °C for 30 s, 60 °C for 30 min, and 90 °C for 30 s. A pool of samples was prepared to serve as the standard for the qPCR by mixing equal volumes from each sample. Abundance of a specific taxon was calculated by the delta–delta threshold cycle (ΔΔCt) method (Livak and Schmittgen, 2001) in which: ΔΔCt=(CtTSE – CtUE) – (CtTSP – CtUP)

Where: CtTSE: Ct of experimental samples for taxon-specific primers, CtUE: Ct of experimental samples for universal primer, CtTSP: Ct for DNA Pool for taxon-specific primers, CtUP: Ct for DNA pool for universal primers. Theoretically, the abundance of a taxon is 2−ddCt.

Statistical analyses

The diversity indices, richness and evenness, were calculated using JAVA implementations (see Supplementary File 2). Kruskal–Wallis, Wilcoxon and Student’s _t_-tests were performed using JMP 8.0 (SAS Institute, Cary, NC, USA) to compare the mean similarities of the groups, case and control. Regression and correlation analyses were performed using JMP 8.0 (SAS Institute) and in R (Open Sourced Statistical software, Vienna, Austria).

Results

We analyzed the adherent microbiota from mucosal biopsies of 33 adenoma cases and 38 non-adenoma controls based on the 16S rDNA genes and high-throughput pyrosequencing methods. Case subjects were slightly older (case 57.4 years) compared with controls (55.7 years). Cases were more likely to have higher WHR than controls (_P_=0.06) and be overweight or obese (_P_=0.09). There were no significant differences between cases and controls for smoking, fiber intake, calories, fat and other risk factors (Table 1). The location of the adenomas were proximal (42%), distal (42.5%) and both locations (15.5%). Adenomas were categorized as small (1–5 mm) medium (6–10 mm) and large (>10 mm) with 69.7% classified as small, 24.2% as medium and 6.1% as large. The average number of adenomas in case subjects was 1.6 (range 1–9).

Our initial analyses looked at global signatures of the entire microbial community. At the phylum, genus and OTU (cluster of sequences in which the average percent identity of all of the sequences within a cluster is ⩾97%) levels we found significant differences in richness (that is, the number of taxa present in a sample), but no differences in evenness (that is, how evenly distributed taxa are within a sample), between cases and controls (Figure 1; Supplementary Figures 1 and 2). In order to see whether case samples cluster separately from control samples, we performed principal component analysis (PCA) of the log-normalized abundance of the 371 OTUs that occur in at least 25% of our samples (Figure 2; Supplemental File 1). Results from this unsupervised clustering showed imperfect but statistically significant clustering based on disease status at the global level; a Wilcoxon test performed on the first principle component from this PCA rejected the null hypothesis that case and control had the same distribution with a _P_-value of 0.0007.

Richness (left panel) and evenness (right panel) for the OTUs observed in our study for cases (n=33) vs controls (n=38). The x axis is proportional to the number of subjects in each category. By the Wilcoxon test, cases had a significantly higher richness (P=0.0061) than controls, but there was no significant difference in evenness (P=0.36).

Figure 1

Richness (left panel) and evenness (right panel) for the OTUs observed in our study for cases (_n_=33) vs controls (_n_=38). The x axis is proportional to the number of subjects in each category. By the Wilcoxon test, cases had a significantly higher richness (_P_=0.0061) than controls, but there was no significant difference in evenness (_P_=0.36).

PCA on the normalized log abundances of taxa at the OTU level. Only the 371 OTUs that are present in at least 25% of all samples were included in the analysis. Component 1 explains 18.81% of all variance while component 2 explains 6.76% (red=case; green=control).

Figure 2

We next tested which individual bacterial taxa were different between cases and controls. By examining the results of the RDP classification algorithm (Wang et al., 2007) at the phylum level, we observed at a 10% FDR threshold that cases had higher relative abundance of TM7, Cyanobacteria and Verrucomicrobia compared with controls (Supplementary Table 1). At the genus level at a 10% FDR threshold, the relative abundance levels of 30 genera including Acidovorax, Aquabacterium, Cloacibacterium, Helicobacter, Lactococcus, Lactobacillus and Pseudomonas were higher in cases vs controls (Supplementary Table 2). Remarkably, only one genus, Streptococcus, had a higher relative abundance in the control group. In order to validate these pyrosequencing results, we developed qPCR assays for a subset of observed genera that were significantly different in their relative abundances between cases and controls (that is, Helicobacter spp., Acidovorax spp. and Cloacibacteria spp.). We observed the expected correlations between the two methods (Supplementary Figure 3), validating the results of our pyrosequencing approach.

We next performed an analysis of OTUs, which are clusters of sequences in which the average percent identity of all of the sequences within a cluster is ⩾97%. Our analysis at the OTU level at a 10% FDR threshold found 87 OTUs with significantly higher relative abundance in cases vs controls and only five OTUs higher in controls (Supplementary Table 3). When we used the RDP classification algorithm to classify the consensus sequence for each of the 92 significantly different OTUs, bacteria with higher relative abundance in cases were mostly members of the phyla Firmicutes (42.6%), Bacteroidetes (25.5%) and Proteobacteria (24.5%) (Figure 3; Supplementary Figure 4). A rank-abundance curve demonstrates that the OTU differences between cases and controls (significant at 10% FDR) are entirely in low-abundance taxa (Supplementary Figure 5). This observation explains why there are differences between case and control in richness (Figure 1), which depends on the total number of taxa observed, but not evenness, which is more sensitive to changes in high-abundance taxa.

Figure 3

Maximum likelihood tree generated from the 371 OTUs in which the OTU was observed in at least 25% of our patients. The tree was generated using the RaxXML EPA server (http://i12k-exelixis3.informatik.tu-muenchen.de/raxml) (see Materials and methods). Branches are colored based on RDP Phylum level assignments. Red-colored branches represent OTUs significantly different between cases and controls within each Phylum (at 10% FDR).

To determine if the microbial differences seen between the cases and controls correlate with clinical metadata associated with the samples, we performed either regressions (for the continuous variable; Table 3a) or _t_-tests (for binary categorical variables; Table 3b) between each metadata category and the first principle component from our sequencing data PC1, generated by collapsing the 371 OTUs that are present in at least 25% of our samples (Figure 2). As shown in Tables 3a and b, PC1 does not show any significant correlation with any of the clinical categories at a 10% FDR except for the disease status (case–control) category, which would be significant even a FDR threshold of 1% (Table 3b). In addition, we performed correlations (for the continuous variables; Table 4a) and Wilcoxon/Kruskal–Wallis tests (for binary categorical variables; Table 4b) between the microbial richness of each sample and the metadata associated with that sample; the results from these tests show that there was no difference in richness with any of the metadata categories associated with our subjects apart from disease (case/control) status (Tables 4a and b).

Table 3a

Regressions between continuous clinical parameters associated with the samples and the first principal component (PC1)

Metadata category	R2	P_-value_	RANK	(n × P_)/R_	Slope
Total number of veg servings	0.056346	0.0569	1	0.68280	0.843878
Total dietary fiber (g)	0.028880	0.1759	2	1.05540	0.127379
Age	0.012339	0.3599	3	1.43960	0.144368
Calcium (mg)	0.011747	0.3901	4	1.17030	0.002297
Waist-to-hip ratio	0.010132	0.4036	5	0.96864	10.321238
Body mass index	0.009079	0.4327	6	0.86540	0.166502
Total number of fruit servings	0.004159	0.6098	7	1.04537	0.275328
NSAIDS use (per month)	0.003517	0.6443	8	0.96645	0.022870
Red meat	0.001908	0.7297	9	0.97293	0.005771
Total fat (g)	0.001291	0.7763	10	0.93156	0.012833
Alcohol (g)	0.000052	0.9545	11	1.04127	0.000902
Caloric intake (kcal)	0.000015	0.9753	12	0.97530	0.000029

Metadata category	R2	P_-value_	RANK	(n × P_)/R_	Slope
Total number of veg servings	0.056346	0.0569	1	0.68280	0.843878
Total dietary fiber (g)	0.028880	0.1759	2	1.05540	0.127379
Age	0.012339	0.3599	3	1.43960	0.144368
Calcium (mg)	0.011747	0.3901	4	1.17030	0.002297
Waist-to-hip ratio	0.010132	0.4036	5	0.96864	10.321238
Body mass index	0.009079	0.4327	6	0.86540	0.166502
Total number of fruit servings	0.004159	0.6098	7	1.04537	0.275328
NSAIDS use (per month)	0.003517	0.6443	8	0.96645	0.022870
Red meat	0.001908	0.7297	9	0.97293	0.005771
Total fat (g)	0.001291	0.7763	10	0.93156	0.012833
Alcohol (g)	0.000052	0.9545	11	1.04127	0.000902
Caloric intake (kcal)	0.000015	0.9753	12	0.97530	0.000029

Abbreviations: NSAID, non-steroidal anti-inflammatory drug; OTU, Operational Taxonomic Unit; PCA, principal component analysis.

PC1 was generated by PCA for the 371 OTUs that have at least one sequence assigned to them in at least 25% of the samples (Figure 5). _P_-values are from linear regression. The last column shows correction for multiple hypothesis testing (Abbolito et al., 1992) using (n × P)/R, where _n_=total number of continuous metadata variables tested, _P_=raw _P_-value and _R_=sorted rank of the metadata variable.

Table 3a

Regressions between continuous clinical parameters associated with the samples and the first principal component (PC1)

Metadata category	R2	P_-value_	RANK	(n × P_)/R_	Slope
Total number of veg servings	0.056346	0.0569	1	0.68280	0.843878
Total dietary fiber (g)	0.028880	0.1759	2	1.05540	0.127379
Age	0.012339	0.3599	3	1.43960	0.144368
Calcium (mg)	0.011747	0.3901	4	1.17030	0.002297
Waist-to-hip ratio	0.010132	0.4036	5	0.96864	10.321238
Body mass index	0.009079	0.4327	6	0.86540	0.166502
Total number of fruit servings	0.004159	0.6098	7	1.04537	0.275328
NSAIDS use (per month)	0.003517	0.6443	8	0.96645	0.022870
Red meat	0.001908	0.7297	9	0.97293	0.005771
Total fat (g)	0.001291	0.7763	10	0.93156	0.012833
Alcohol (g)	0.000052	0.9545	11	1.04127	0.000902
Caloric intake (kcal)	0.000015	0.9753	12	0.97530	0.000029

Metadata category	R2	P_-value_	RANK	(n × P_)/R_	Slope
Total number of veg servings	0.056346	0.0569	1	0.68280	0.843878
Total dietary fiber (g)	0.028880	0.1759	2	1.05540	0.127379
Age	0.012339	0.3599	3	1.43960	0.144368
Calcium (mg)	0.011747	0.3901	4	1.17030	0.002297
Waist-to-hip ratio	0.010132	0.4036	5	0.96864	10.321238
Body mass index	0.009079	0.4327	6	0.86540	0.166502
Total number of fruit servings	0.004159	0.6098	7	1.04537	0.275328
NSAIDS use (per month)	0.003517	0.6443	8	0.96645	0.022870
Red meat	0.001908	0.7297	9	0.97293	0.005771
Total fat (g)	0.001291	0.7763	10	0.93156	0.012833
Alcohol (g)	0.000052	0.9545	11	1.04127	0.000902
Caloric intake (kcal)	0.000015	0.9753	12	0.97530	0.000029

Abbreviations: NSAID, non-steroidal anti-inflammatory drug; OTU, Operational Taxonomic Unit; PCA, principal component analysis.

Table 3b

Wilcoxon/Kruskal–Wallis tests on the first principal component (PC1) based on the subgroupings of the categorical clinical parameters associated with the samples

Metadata category	Wilcoxon/Kruskal–Wallis P_-value_	RANK	(n × P_)/R_
Disease status (case, control)	0.00070	1	0.00490
Body mass index category (normal, overweight, obese)	0.21980	2	0.76930
Family history of colorectal cancer	0.23030	3	0.53737
Diabetes history	0.46360	4	0.81130
Ever smoked	0.74140	5	1.03796
Sex	0.86240	6	1.00613
Antibiotics use	0.93990	7	0.93990

Metadata category	Wilcoxon/Kruskal–Wallis P_-value_	RANK	(n × P_)/R_
Disease status (case, control)	0.00070	1	0.00490
Body mass index category (normal, overweight, obese)	0.21980	2	0.76930
Family history of colorectal cancer	0.23030	3	0.53737
Diabetes history	0.46360	4	0.81130
Ever smoked	0.74140	5	1.03796
Sex	0.86240	6	1.00613
Antibiotics use	0.93990	7	0.93990

Abbreviations: OTU, Operational Taxonomic Unit; PCA, principal component analysis.

PC1 was generated by PCA for the 371 OTUs that have at least one sequence assigned to them in at least 25% of the samples (Figure 5). _P_-values are from the Wilcoxon/Kruskal–Wallis on the first principle component. The last column shows correction for multiple hypothesis testing (Abbolito et al., 1992) using (n × P)/R, where _n_=total number of categorical metadata variables tested, _P_=raw _P_-value and _R_=sorted rank of the metadata variable.

Table 3b

Wilcoxon/Kruskal–Wallis tests on the first principal component (PC1) based on the subgroupings of the categorical clinical parameters associated with the samples

Metadata category	Wilcoxon/Kruskal–Wallis P_-value_	RANK	(n × P_)/R_
Disease status (case, control)	0.00070	1	0.00490
Body mass index category (normal, overweight, obese)	0.21980	2	0.76930
Family history of colorectal cancer	0.23030	3	0.53737
Diabetes history	0.46360	4	0.81130
Ever smoked	0.74140	5	1.03796
Sex	0.86240	6	1.00613
Antibiotics use	0.93990	7	0.93990

Metadata category	Wilcoxon/Kruskal–Wallis P_-value_	RANK	(n × P_)/R_
Disease status (case, control)	0.00070	1	0.00490
Body mass index category (normal, overweight, obese)	0.21980	2	0.76930
Family history of colorectal cancer	0.23030	3	0.53737
Diabetes history	0.46360	4	0.81130
Ever smoked	0.74140	5	1.03796
Sex	0.86240	6	1.00613
Antibiotics use	0.93990	7	0.93990

Abbreviations: OTU, Operational Taxonomic Unit; PCA, principal component analysis.

Table 4a

Regressions between continuous clinical parameters associated with the samples and the richness index of each sample

Metadata category	R_2_	P_-value_	RANK	(n × P_)/R_	Slope
Total_number_of_veg_servings	0.059915	0.0556	1	0.66720	6.12603
Total_dietary_fiber_(g)	0.022133	0.2369	2	1.42140	0.805456
Waist-to-hip ratio	0.013745	0.3295	3	1.31800	86.42771
Body mass index	0.000900	0.4319	4	1.29570	1.191711
Red meat	0.007795	0.4843	5	1.16232	4.827834
Calcium (mg)	0.007269	0.4995	6	0.99900	0.013053
NSAIDs use (per month)	0.007103	0.5114	7	0.87669	0.237458
Total_number_of_fruit_servings	0.006869	0.5116	8	0.76740	2.555904
Total_fat (g)	0.005034	0.5744	9	0.76587	0.119564
Age	0.003732	0.6154	10	0.73848	0.569015
Alcohol (g)	0.000157	0.7541	11	0.82265	0.035803
Caloric intake (kcal)	0.001225	0.782	12	0.78200	0.001853

Metadata category	R_2_	P_-value_	RANK	(n × P_)/R_	Slope
Total_number_of_veg_servings	0.059915	0.0556	1	0.66720	6.12603
Total_dietary_fiber_(g)	0.022133	0.2369	2	1.42140	0.805456
Waist-to-hip ratio	0.013745	0.3295	3	1.31800	86.42771
Body mass index	0.000900	0.4319	4	1.29570	1.191711
Red meat	0.007795	0.4843	5	1.16232	4.827834
Calcium (mg)	0.007269	0.4995	6	0.99900	0.013053
NSAIDs use (per month)	0.007103	0.5114	7	0.87669	0.237458
Total_number_of_fruit_servings	0.006869	0.5116	8	0.76740	2.555904
Total_fat (g)	0.005034	0.5744	9	0.76587	0.119564
Age	0.003732	0.6154	10	0.73848	0.569015
Alcohol (g)	0.000157	0.7541	11	0.82265	0.035803
Caloric intake (kcal)	0.001225	0.782	12	0.78200	0.001853

Abbreviation: NSAID, non-steroidal anti-inflammatory drug.

The richness index was calculated as described in the methods section. _P_-values are from linear regression. The last column shows correction for multiple hypothesis testing (Abbolito et al., 1992) using (n × P)/R, where _n_=total number of continuous metadata variables tested, _P_=raw _P_-value and _R_=sorted rank of the metadata variable.

Table 4a

Regressions between continuous clinical parameters associated with the samples and the richness index of each sample

Metadata category	R_2_	P_-value_	RANK	(n × P_)/R_	Slope
Total_number_of_veg_servings	0.059915	0.0556	1	0.66720	6.12603
Total_dietary_fiber_(g)	0.022133	0.2369	2	1.42140	0.805456
Waist-to-hip ratio	0.013745	0.3295	3	1.31800	86.42771
Body mass index	0.000900	0.4319	4	1.29570	1.191711
Red meat	0.007795	0.4843	5	1.16232	4.827834
Calcium (mg)	0.007269	0.4995	6	0.99900	0.013053
NSAIDs use (per month)	0.007103	0.5114	7	0.87669	0.237458
Total_number_of_fruit_servings	0.006869	0.5116	8	0.76740	2.555904
Total_fat (g)	0.005034	0.5744	9	0.76587	0.119564
Age	0.003732	0.6154	10	0.73848	0.569015
Alcohol (g)	0.000157	0.7541	11	0.82265	0.035803
Caloric intake (kcal)	0.001225	0.782	12	0.78200	0.001853

Metadata category	R_2_	P_-value_	RANK	(n × P_)/R_	Slope
Total_number_of_veg_servings	0.059915	0.0556	1	0.66720	6.12603
Total_dietary_fiber_(g)	0.022133	0.2369	2	1.42140	0.805456
Waist-to-hip ratio	0.013745	0.3295	3	1.31800	86.42771
Body mass index	0.000900	0.4319	4	1.29570	1.191711
Red meat	0.007795	0.4843	5	1.16232	4.827834
Calcium (mg)	0.007269	0.4995	6	0.99900	0.013053
NSAIDs use (per month)	0.007103	0.5114	7	0.87669	0.237458
Total_number_of_fruit_servings	0.006869	0.5116	8	0.76740	2.555904
Total_fat (g)	0.005034	0.5744	9	0.76587	0.119564
Age	0.003732	0.6154	10	0.73848	0.569015
Alcohol (g)	0.000157	0.7541	11	0.82265	0.035803
Caloric intake (kcal)	0.001225	0.782	12	0.78200	0.001853

Abbreviation: NSAID, non-steroidal anti-inflammatory drug.

Table 4b

Wilcoxon/Kruskal–Wallis tests on the richness indices based on the subgroupings of the categorical clinical parameters associated with the samples

Metadata category	Wilcoxon/Kruskal–Wallis P_-value_	RANK	(n × P_)/R_
Disease status (case, control)	0.00610	1	0.04270
Family history of CRC	0.20230	2	0.70805
Body mass index_category (normal,overweight, obese)	0.21140	3	0.49327
Diabetes history	0.55390	4	0.96933
Ever smoked	0.81520	5	1.14128
Sex	0.81720	6	0.95340
Antibiotics use	0.88010	7	0.88010

Metadata category	Wilcoxon/Kruskal–Wallis P_-value_	RANK	(n × P_)/R_
Disease status (case, control)	0.00610	1	0.04270
Family history of CRC	0.20230	2	0.70805
Body mass index_category (normal,overweight, obese)	0.21140	3	0.49327
Diabetes history	0.55390	4	0.96933
Ever smoked	0.81520	5	1.14128
Sex	0.81720	6	0.95340
Antibiotics use	0.88010	7	0.88010

Abbreviation: CRC, colorectal cancer.

The last column shows correction for multiple hypothesis testing (Abbolito et al., 1992) using (n × P)/R, where _n_=total number of categorical metadata variables tested, _P_=raw _P_-value and _R_=sorted rank of the metadata variable.

Table 4b

Wilcoxon/Kruskal–Wallis tests on the richness indices based on the subgroupings of the categorical clinical parameters associated with the samples

Metadata category	Wilcoxon/Kruskal–Wallis P_-value_	RANK	(n × P_)/R_
Disease status (case, control)	0.00610	1	0.04270
Family history of CRC	0.20230	2	0.70805
Body mass index_category (normal,overweight, obese)	0.21140	3	0.49327
Diabetes history	0.55390	4	0.96933
Ever smoked	0.81520	5	1.14128
Sex	0.81720	6	0.95340
Antibiotics use	0.88010	7	0.88010

Metadata category	Wilcoxon/Kruskal–Wallis P_-value_	RANK	(n × P_)/R_
Disease status (case, control)	0.00610	1	0.04270
Family history of CRC	0.20230	2	0.70805
Body mass index_category (normal,overweight, obese)	0.21140	3	0.49327
Diabetes history	0.55390	4	0.96933
Ever smoked	0.81520	5	1.14128
Sex	0.81720	6	0.95340
Antibiotics use	0.88010	7	0.88010

Abbreviation: CRC, colorectal cancer.

Since obesity is a risk factor for development of CRC, and changes in the human microbiome have previously been associated with obesity (Turnbaugh et al., 2009; Zhang et al., 2009) we further evaluated the relationship between the relative abundance levels of the individual taxa and BMI and WHR. We classified subjects into one of the three BMI categories; normal (BMI <25), overweight (BMI=25–29) and obese (BMI ⩾30) and three WHR levels; low, medium and high based on accepted thresholds (http://www.bmi-calculator.net/waist-to-hip-ratio-calculator/waist-to-hip-ratio-chart.php). For each OTU, the non-parametric Kruskal–Wallis test was performed between the three groups for BMI and WHR. There were no OTUs that showed significant differences between the various BMI and WHR risk factor categories even if we were to set a FDR threshold as high as <200% (Supplementary Tables 4 and 5). Likewise, there were no significant differences in the diversity measures, richness and evenness, between the various risk factor categories (Figures 4 and 5). Finally, regressions between BMI values and WHR values against each taxa at the OTU level also showed no significant association between the OTUs with either BMI (Supplementary Figure 6) or WHR (Supplementary Figure 7) at an FDR threshold of <10% (Supplementary Tables 6 and 7).

Richness (left panel) and evenness (right panel) at the OTU level, in normal (n=27) vs overweight (n=25) vs obese (n=18) BMI categories. No significant difference was seen by the Kruskal–Wallis test in richness (P=0.21) or evenness (P=0.42) between the three categories.

Figure 4

Richness (left panel) and evenness (right panel) at the OTU level, in normal (_n_=27) vs overweight (_n_=25) vs obese (_n_=18) BMI categories. No significant difference was seen by the Kruskal–Wallis test in richness (_P_=0.21) or evenness (_P_=0.42) between the three categories.

Richness (left panel) and evenness (right panel) at the OTU level, in low-risk (n=25) vs medium-risk (n=16) vs high-risk (n=30) WHR categories. No significant difference was seen by the Kruskal–Wallis test in richness (P=0.26) or evenness (P=0.76) between the three categories.

Figure 5

Richness (left panel) and evenness (right panel) at the OTU level, in low-risk (_n_=25) vs medium-risk (_n_=16) vs high-risk (_n_=30) WHR categories. No significant difference was seen by the Kruskal–Wallis test in richness (_P_=0.26) or evenness (_P_=0.76) between the three categories.

Discussion

In a recent study, in which we used the more limited T-RFLP fingerprinting and traditional clone sequencing methods, we reported Firmicutes, Bacteroidetes and Proteobacteria as the predominant phyla in colonic mucosa (Shen et al., 2010). These findings are compatible with our current observations where we used a more advanced sequencing technology to gain better insights into the relationship between mucosal adherent bacteria and adenomas. The depth of sequencing and the larger sample size in this study provides better coverage and a better understanding of the overall structure and composition of the microbiota. For instance, the phylum Actinobacteria was not detected in our previous study (Shen et al., 2010) but was detected in this study. The current case–control study found a large number of differences within the microbial community between adenoma case and non-control subjects with higher microbial richness in cases than controls. In particular, we found increased relative abundance of potential pathogens such as Pseudomonas, Helicobacter and, Acinetobacter (Supplementary Table 2; Supplementary Table 3) and other genera belonging to the phylum Proteobacteria (Figure 2). The presence of these potential pathogens may directly increase the risk of adenoma development by changing the gut environment. For example, Helicobacter has a much higher relative abundance in cases vs controls (Supplementary Tables 2 and 3) consistent with previous studies, which implicate this bacterium in colorectal adenomas (Zumkeller et al., 2006; Burnett-Hartman et al., 2008; Takakura et al., 2008); a possible explanation for this association is that this microbe alters the pH of the gastrointestinal tract (Abbolito et al., 1992; Chen et al., 1997). Acidovorax spp., one of the bacterial signatures identified as significantly different between case and control in this study, is a flagellated, gram negative acid degrading member of the phylum Proteobacteria. Although, not much is known about its clinical epidemiology and pathogenicity in humans, it has been reported to also degrade nitro-aromatic compounds (Malkan et al., 2009). A potential mechanism for the Acidovorax–adenoma association could relate to the induction of local inflammation by increased flagellar proteins resulting from the higher abundance of Acidovorax (Tanaka et al., 2003; Takakura et al., 2008).

Another potential mechanism could be related to changes in the local gut environment that favors increased abundance of specific taxa. For example, acid producing bacteria such as Lactobacillus, are known to lower the gut pH and regulate the growth of other bacteria. It is possible that the increased abundance of Lactobacillus could influence levels of Acidovorax that have the ability to degrade acid produced by Lactobacillus as a carbon source. Interestingly, we observed that Lactobacillus was more abundant in cases than controls. Lactobacillus is considered a beneficial microbe (Gibson and Roberfroid, 1995; Duncan et al., 2004) with relatively low abundance in the gut (Eckburg et al., 2005). We propose that Lactobacillus spp. may induce changes in the adherent ecosystem that could alter the pH to create favorable conditions for bacterial dysbiosis. This is consistent with suggestions by Duncan et al. (2004) that bacteria that grow in acidic pH create an environment that can be exploited by more low pH-tolerant microbes. Cloacibacterium, another bacterium that differed significantly between case and control at the genus level, is a gram negative anaerobe. It is a Flavobacterium that belongs to the phylum Bacteroidetes and plays an important role in breaking down complex organic matter (Bernardet et al., 2002). Disruption of homeostasis in the adherent ecosystem may also account for the higher abundance of Cloacibacterium. Although we found that several bacterial genera were associated with colorectal adenomas and the exact mechanisms for bacterial dysbiosis–adenoma relationship are not well defined, we have attempted to suggest potential mechanisms and highlight a few genera. Other factors that change the colonic environment such as diet and host factors could also contribute to the bacterial dysbiosis and adenoma association. We recognize that there is limited information about the function of most of these bacteria as such our findings will need to be verified in a future studies. While these findings provide important clues to the relationship between microbial diversity and colorectal adenomas, the case–control design limits our ability to assess causality. However, our findings have the potential to inform future studies in animal models to evaluate mechanisms.

Taken together, these findings demonstrate that the presence of adenomas is associated with changes in the relative abundance of various taxa, including potential pathogens, present in the gut mucosa and that these changes are not significantly correlated with other clinical parameters such as obesity levels (WHR and BMI), age, NSIAD use and antibiotic use (Tables 3 and 4). Previous metagenomic studies have implicated the composition of the microbial community as contributing to obesity (Ley et al., 2005, 2006; Turnbaugh et al., 2006; Turnbaugh and Gordon, 2009; Turnbaugh et al., 2009) although this has been controversial (Larsen et al., 2010). In our study, we did not see any statistically significant relationship between BMI or WHR and any individual taxa within the microbial community (Figures 4 and 5; Supplementary Tables 4–7). Likewise, when we examined our data for a relationship between BMI and the Firmicutes to Bacteroidetes ratio, we observed no such correlation (data not shown). These results stand in contrast to previous studies that have observed that BMI is associated with the increased presence of Firmicutes (Turnbaugh et al., 2006; Turnbaugh and Gordon, 2009). It is unclear whether regional differences in subjects (all of our subjects were from North Carolina) or differences in sampling (stool sampling vs biopsies) can explain some of the differences between our work and previous results. We also cannot rule out the possibility that BMI or other factors might have achieved a statistically significant influence over the composition of the microbial community with a larger sample size than we had in our study. What seems inarguable from our data set, however, is that case–control status is asserting a larger degree of influence on the microbial community than any of the other clinical parameters that we collected (Tables 3 and 4). This observation suggests that conditions associated with adenoma formation are more strongly linked with the microbial community membership and structure, across patients, than any of the other factors that we evaluated. These observations are consistent with the idea that while healthy individuals have a great deal of inter-personal variation in their microbiome (Costello et al., 2009), disease states have a specific microbial signature.

Our observation that the microbial signature associated with adenomas is largely distinct from that associated with obesity suggests that next-generation sequencing of microbial communities may have considerable value in predicting the actual presence of adenomas. A strength of this study is the use of high-throughput 454 pyrosequencing for an in-depth evaluation of the adherent colonic mucosal bacteria in relation to adenomas. Two recent papers (Castellarin et al., 2012; Kostic et al., 2012), using similar high-throughput methods have independently found that Fusobacterium is associated with tissue from colorectal carcinomas. In addition, recent studies (Marchesi et al., 2011; Sobhani et al., 2011) compared the tissues of colonic tumors with non-malignant mucosa and found substantial differences in the associated microbiota. Together with our results, which describe differences in the microbiota from normal rectal mucosa of adenoma cases and controls, a picture is beginning to develop that link microbial changes to colorectal adenomas and cancer.

Competing interests

Conflict of interest

The authors declare no conflict of interest.

Acknowledgements

This research was supported, in part, by grants from the National Institutes of Health NIH P30DK034987, R01 CA44684, P50 CA106991, R01 CA136887, K01 DK 073695. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Nucleotide sequence accession numbers: All 454 pyrosequence data from this study have been submitted to Genbank database under the accession # SRS 166138.1-172960.2.

Data availability

Accession codes

Accessions

GenBank/EMBL/DDBJ

SRS 166138.1-172960.2

References

Abbolito

Ameglio

Guerrera

Citarda

Grassi

Sciarretta

et al. (

1992

The association of Helicobacter pylori infection with low levels of urea and pH in the gastric juices

Ital J Gastroenterol

389

–

392

Backhed

Ding

Wang

Hooper

Koh

Nagy

et al. (

2004

The gut microbiota as an environmental factor that regulates fat storage

Proc Natl Acad Sci USA

101

15718

–

15723

Benjamini

Hochberg

. (

1995

A Practical and Powerful Approach to Multiple Testing

J Roy Stat Soc B

Bernardet

Nakagawa

Holmes

. (

2002

Proposed minimal standards for describing new taxa of the family Flavobacteriaceae and emended description of the family

Int J Syst Evol Microbiol

1049

–

1070

Bingham

. (

2000

Diet and colorectal cancer prevention

Biochem Soc Trans

–

Burcelin

Luche

Serino

Amar

. (

2009

The gut microbiota ecology: a new opportunity for the treatment of metabolic diseases?

Front Biosci

5107

–

5117

Burnett-Hartman

Newcomb

Potter

. (

2008

Infectious agents and colorectal cancer: a review of Helicobacter pylori, Streptococcus bovis, JC virus, and human papillomavirus

Cancer Epidemiol Biomarkers Prev

2970

–

2979

Cani

Delzenne

. (

2009

Interplay between obesity and associated metabolic disorders: new insights into the gut microbiota

Curr Opin Pharmacol

737

–

743

Castellarin

Warren

Freeman

Dreolini

Krzywinski

Strauss

et al. (

2012

Fusobacterium nucleatum infection is prevalent in human colorectal carcinoma

Genome Res

299

–

306

Chen

Fournier

Varanasi

Mahama-Relue

. (

1997

Helicobacter pylori survival in gastric mucosa by generation of a pH gradient

Biophys J

1081

–

1088

Chou

Holmes

. (

2001

DNA sequence quality trimming and vector removal

Bioinformatics

1093

–

1104

Costello

Lauber

Hamady

Fierer

Gordon

Knight

. (

2009

Bacterial community variation in human body habitats across space and time

Science

326

1694

–

1697

Duncan

Louis

Flint

. (

2004

Lactate-utilizing bacteria, isolated from human feces, that produce butyrate as a major fermentation product

Appl Environ Microbiol

5810

–

5817

Eckburg

Bik

Bernstein

Purdom

Dethlefsen

Sargent

et al. (

2005

Diversity of the human intestinal microbial flora

Science

308

1635

–

1638

Gibson

Roberfroid

. (

1995

Dietary modulation of the human colonic microbiota: introducing the concept of prebiotics

J Nutr

125

1401

–

1412

Gill

Pop

Deboy

Eckburg

Turnbaugh

Samuel

et al. (

2006

Metagenomic analysis of the human distal gut microbiome

Science

312

1355

–

1359

Grice

Kong

Renaud

Young

Bouffard

Blakesley

et al. (

2008

A diversity profile of the human skin microbiota

Genome Res

1043

–

1050

Haas

Gevers

Earl

Feldgarden

Ward

Giannoukos

et al. (

2011

Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons

Genome Res

494

–

504

Hope

Hold

Kain

El-Omar

. (

2005

Sporadic colorectal cancer--role of the commensal microbiota

FEMS Microbiol Lett

244

–

Huycke

Gaskins

. (

2004

Commensal bacteria, redox stress, and colorectal cancer: mechanisms and models

Exp Biol Med (Maywood)

229

586

–

597

Jemal

Siegel

Ward

. (

2010

Cancer statistics, 2010

CA Cancer J Clin

277

–

300

Keku

Lund

Galanko

Simmons

Woosley

Sandler

. (

2005

Insulin resistance, apoptosis, and colorectal adenoma risk

Cancer Epidemiol Biomarkers Prev

2076

–

2081

Kostic

Gevers

Pedamallu

Michaud

Duke

Earl

et al. (

2012

Genomic analysis identifies association of Fusobacterium with colorectal carcinoma

Genome Res

292

–

298

Kunin

Engelbrektson

Ochman

Hugenholtz

. (

2010

Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates

Environ Microbiol

118

–

123

Lance

. (

1997

Recent developments in colorectal cancer

J R Coll Physicians Lond

483

–

487

Larsen

Vogensen

van den Berg

Nielsen

Andreasen

Pedersen

et al. (

2010

Gut microbiota in human adults with type 2 diabetes differs from non-diabetic adults

PLoS One

e9085

Leek

Scharpf

Bravo

Simcha

Langmead

Johnson

et al. (

2010

Tackling the widespread and critical impact of batch effects in high-throughput data

Nat Rev Genet

733

–

739

Ley

Backhed

Turnbaugh

Lozupone

Knight

Gordon

. (

2005

Obesity alters gut microbial ecology

Proc Natl Acad Sci USA

102

11070

–

11075

Ley

Turnbaugh

Klein

Gordon

. (

2006

Microbial ecology: human gut microbes associated with obesity

Nature

444

1022

–

1023

Livak

Schmittgen

. (

2001

Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) Method

Methods

402

–

408

Malkan

Strollo

Scholand

Dudrick

. (

2009

Implanted-port-catheter-related sepsis caused by Acidovorax avenae and methicillin-sensitive Staphylococcus aureus

J Clin Microbiol

3358

–

3361

Marchesi

Dutilh

Hall

Peters

Roelofs

Boleij

et al. (

2011

Towards the human colorectal cancer microbiome

PLoS One

e20447

Margulies

Egholm

Altman

Attiya

Bader

Bemben

et al. (

2005

Genome sequencing in microfabricated high-density picolitre reactors

Nature

437

376

–

380

Moore

. (

1995

Intestinal floras of populations that have a high risk of colon cancer

Appl Environ Microbiol

3202

–

3207

Petrosino

Highlander

Luna

Gibbs

Versalovic

. (

2009

Metagenomic pyrosequencing and microbial identification

Clin Chem

856

–

866

Powrie

Uhlig

. (

2004

Animal models of intestinal inflammation: clues to the pathogenesis of inflammatory bowel disease

Novartis Found Symp

263

164

–

174

Rakoff-Nahoum

Medzhitov

. (

2006

Role of the innate immune system and host-commensal mutualism

Curr Top Microbiol Immunol

308

–

Rakoff-Nahoum

Paglino

Eslami-Varzaneh

Edberg

Medzhitov

. (

2004

Recognition of commensal microflora by toll-like receptors is required for intestinal homeostasis

Cell

118

229

–

241

Savage

. (

1977

Microbial ecology of the gastrointestinal tract

Annu Rev Microbiol

107

–

133

Scanlan

Shanahan

Clune

Collins

O'Sullivan

O'Riordan

et al. (

2008

Culture-independent analysis of the gut microbiota in colorectal cancer and polyposis

Environ Microbiol

789

–

798

Shen

Rawls

Randall

Burcal

Mpande

Jenkins

et al. (

2010

Molecular characterization of mucosal adherent bacteria and associations with colorectal adenomas

Gut Microbes

138

–

147

Sobhani

Tap

Roudot-Thoraval

Roperch

Letulle

Langella

et al. (

2011

Microbial dysbiosis in colorectal cancer (CRC) patients

PLoS One

e16393

Takakura

Che

Ishida

Tsutsumi

Kurotani

Usami

et al. (

2008

Expression of a bacterial flagellin gene triggers plant immune responses and confers disease resistance in transgenic rice plants

Mol Plant Pathol

525

–

529

Tanaka

Che

Watanabe

Fujiwara

Takayama

Isogai

. (

2003

Flagellin from an incompatible strain of Acidovorax avenae mediates H2O2 generation accompanying hypersensitive cell death and expression of PAL, Cht-1, and PBZ1, but not of Lox in rice

Mol Plant Microbe Interact

422

–

428

Tsukumo

Carvalho

Carvalho-Filho

Saad

. (

2009

Translational research into gut microbiota: new horizons in obesity treatment

Arq Bras Endocrinol Metabol

139

–

144

Turnbaugh

Gordon

. (

2009

The core gut microbiome, energy balance and obesity

J Physiol

587

4153

–

4158

Turnbaugh

Hamady

Yatsunenko

Cantarel

Duncan

Ley

et al. (

2009

A core gut microbiome in obese and lean twins

Nature

457

480

–

484

Turnbaugh

Ley

Mahowald

Magrini

Mardis

Gordon

. (

2006

An obesity-associated gut microbiome with increased capacity for energy harvest

Nature

444

1027

–

1031

Wang

Garrity

Tiedje

Cole

. (

2007

Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy

Appl Environ Microbiol

5261

–

5267

Willing

Halfvarson

Dicksved

Rosenquist

Jarnerot

Engstrand

et al. (

2009

Twin studies reveal specific imbalances in the mucosa-associated microbiota of patients with ileal Crohn's disease

Inflamm Bowel Dis

653

–

660

Zhang

DiBaise

Zuccolo

Kudrna

Braidotti

et al. (

2009

Human gut microbiota in obesity and after gastric bypass

Proc Natl Acad Sci USA

106

2365

–

2370

Zumkeller

Brenner

Zwahlen

Rothenbacher

. (

2006

Helicobacter pylori infection and colorectal cancer risk: a meta-analysis

Helicobacter

–

Supplementary Information accompanies the paper on The ISME Journal website

Supplementary information The online version of this article (doi:10.1038/ismej.2012.43) contains supplementary material, which is available to authorized users.

Increased rectal microbial richness is associated with the presence of colorectal adenomas in humans (original) (raw)

Cite

Abstract

Introduction

Materials and methods

Study participants, colonoscopy and biopsy procedures

DNA extraction and sequencing

Data filtering

Sample filtering

Sequence filtering

Bacterial identification

Richness and evenness

Data preprocessing

Removal of rare taxa

Tree generation

Data validation

Real-time quantitative PCR validation

Statistical analyses

Results

Discussion

Competing interests

Conflict of interest

Acknowledgements

Data availability

Accession codes

Accessions

GenBank/EMBL/DDBJ

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Email alerts

Citing articles via

Latest

Most Cited

Increased rectal microbial richness is associated with the presence of colorectal adenomas in humans (original) (raw)

Cite

Abstract

Introduction

Materials and methods

Study participants, colonoscopy and biopsy procedures

DNA extraction and sequencing

Data filtering

Sample filtering

Sequence filtering

Bacterial identification

Richness and evenness

Data preprocessing

Removal of rare taxa

Tree generation

Data validation

Real-time quantitative PCR validation

Statistical analyses

Results

Discussion

Competing interests

Conflict of interest

Acknowledgements

Data availability

Accession codes

Accessions

GenBank/EMBL/DDBJ

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Email alerts

Citing articles via

Latest

Most Read

Most Cited