Mark Stoeckle - Profile on Academia.edu (original) (raw)
Papers by Mark Stoeckle
Frontiers in Zoology, 2010
Background: The identification of vast numbers of unknown organisms using DNA sequences becomes m... more Background: The identification of vast numbers of unknown organisms using DNA sequences becomes more and more important in ecological and biodiversity studies. In this context, a fragment of the mitochondrial cytochrome c oxidase I (COI) gene has been proposed as standard DNA barcoding marker for the identification of organisms. Limitations of the COI barcoding approach can arise from its single-locus identification system, the effect of introgression events, incomplete lineage sorting, numts, heteroplasmy and maternal inheritance of intracellular endosymbionts. Consequently, the analysis of a supplementary nuclear marker system could be advantageous. Results: We tested the effectiveness of the COI barcoding region and of three nuclear ribosomal expansion segments in discriminating ground beetles of Central Europe, a diverse and well-studied invertebrate taxon. As nuclear markers we determined the 18S rDNA: V4, 18S rDNA: V7 and 28S rDNA: D3 expansion segments for 344 specimens of 75 species. Seventy-three species (97%) of the analysed species could be accurately identified using COI, while the combined approach of all three nuclear markers provided resolution among 71 (95%) of the studied Carabidae. Our results confirm that the analysed nuclear ribosomal expansion segments in combination constitute a valuable and efficient supplement for classical DNA barcoding to avoid potential pitfalls when only mitochondrial data are being used. We also demonstrate the high potential of COI barcodes for the identification of even closely related carabid species.
bioRxiv (Cold Spring Harbor Laboratory), Apr 22, 2022
Relating environmental DNA (eDNA) signal strength to organism abundance requires a fundamental un... more Relating environmental DNA (eDNA) signal strength to organism abundance requires a fundamental understanding of eDNA production. A number of studies have demonstrated that eDNA production may scale allometrically -that is, larger organisms tend to exhibit lower mass-specific eDNA production rates, likely due to allometric scaling in key processes related to eDNA production (e.g. surface area, excretion/egestion). While most previous studies have examined intra-specific allometry, physiological rates and organism surface area also scale allometrically across species. We therefore hypothesize that eDNA production will similarly exhibit inter-specific allometric scaling. To evaluate this hypothesis, we reanalyzed previously published eDNA data from Stoeckle et al. ( ) which compared metabarcoding read count to organism count and biomass data obtained from trawl surveys. Using a Bayesian model we empirically estimated the value of the allometric scaling coefficient ('b') for bony fishes to be 0.67 (credible interval = 0.58 -0.77), although our model failed to converge for chondrichthyan species. We found that integrating allometry significantly improved correlations between organism abundance and metabarcoding read count relative to traditional metrics of abundance (density and biomass) for bony fishes. Although substantial unexplained variation remains in the relationship between read count and organism abundance, our study provides evidence that eDNA production tends to scale allometrically across species. Future studies investigating the relationship between eDNA signal strength and metrics of fish abundance could potentially be improved by accounting for allometry -a scaling coefficient value of ~2/3 appears to be both theoretically and empirically justified. .
Infection and Immunity, 1996
We investigated the role of the pef operon, containing the genes for plasmid-encoded (PE) fimbria... more We investigated the role of the pef operon, containing the genes for plasmid-encoded (PE) fimbriae of Salmonella typhimurium, in adhesion to the murine small intestine. In an organ culture model, a mutant of S. typhimurium carrying a tetracycline resistance cassette inserted in pefC was found to be associated in lower numbers with murine small intestine than the wild type. Similarly, heterologous expression of PE fimbriae in Escherichia coli increased the bacterial numbers recovered from the intestine in the organ culture model. PE fimbriae was further demonstrated by binding of an E. coli strain expressing PE fimbriae to thin sections of mouse small intestine. The contribution of pef-mediated adhesion on fluid accumulation was investigated in infant mice. Intragastric injection of S. typhimurium 14028 and SR-11 caused fluid accumulation in infant mice. In contrast, pefC mutants of S. typhimurium 14028 and SR-11 were negative in the infant mouse assay. Introduction of a plasmid containing pefBACD and orf5, the first five genes of the pef operon, into the pefC mutant complemented for fluid accumulation in the infant mouse assay. However, heterologous expression of PE fimbriae in E. coli did not result in fluid accumulation in the infant mouse, suggesting that factors other than fimbriae are involved in causing fluid accumulation. Salmonella typhimurium is the most common cause of acute gastroenteritis in humans in the United States. However, the mechanism by which S. typhimurium causes diarrhea in humans is not well defined. Although at least three different toxic activities of S. typhimurium have been found in several animal and cell culture models, their contribution to the generation of diarrhea in humans has never been conclusively demonstrated . In fact, salmonellosis appears to be a complex, multifactorial process (43), and the ability of S. typhimurium to multiply in the lamina propria and cause inflammation may contribute significantly to diarrheal disease . Bacterial adhesins are known to support colonization of the host's alimentary tract, thereby increasing the bacterial load in proximity to the epithelial lining. As a consequence, fimbriae of enterotoxigenic Escherichia coli and Vibrio cholerae are necessary for diarrhea . Although several fimbrial adhesins have been found in S. typhimurium (1), fimbriae have so far not been implicated in fluid accumulation in animal models. In this report, we present evidence that plasmid-encoded (PE) fimbriae of S. typhimurium mediate adhesion to mouse small intestine and are necessary for fluid accumulation in the infant mouse assay. Bacterial strains, cell lines, and growth conditions. Bacterial strains used in this study are listed in Table . All bacteria were cultured in Luria-Bertani broth (LB; 5 g of yeast extract, 10 g of tryptone, and 10 g of NaCl per liter) or on plates (LB broth containing 15 g of agar per liter) at 37ЊC. Antibiotics, when required, were included in the culture medium or plates at the following concentrations: carbenicillin, 100 mg/liter; kanamycin, 60 mg/liter; nalidixic acid, 50 mg/liter; chloramphenicol, 30 mg/liter; and tetracycline, 10 mg/liter. HeLa and T84 cells were cultivated in Dulbecco's modified Eagle's medium (GIBCO) supplemented with 10% heat-inactivated fetal calf serum (GIBCO), 1% nonessential amino acids, and 1 mM glutamine (DMEMsup). For adhesion assays, 24-well microtiter plates were seeded with HeLa or T84 cells at a concentration of 5 ϫ 10 5 cells per well in 0.5 ml of DMEMsup and incubated overnight at 37ЊC in 5% CO 2 . Analytical-grade chemicals were purchased from Sigma. All enzymes were purchased from Boehringer Mannheim. Recombinant DNA and genetic techniques. Plasmid DNA was isolated by using ion-exchange columns from Qiagen. Standard methods were used for restriction endonuclease analyses, ligation and transformation of plasmid DNA, transfer of plasmid DNA by conjugation, and isolation of chromosomal DNA from bacteria . Plasmids were constructed by using the vector pBluescript SKϩ (40) or the suicide vector pEP185.2 (21). Southern transfer of DNA onto a nylon membrane was performed as previously described (27). Labeling of DNA probes, hybridization, and immunological detection were done by using the DNA labeling and detection kit (nonradioactive) from Boehringer Mannheim. The DNA was labeled by random-primed incorporation of digoxygenin-labeled dUTP. Hybridization was performed at 65ЊC in solutions without formamide. Hybrids were detected by an enzymelinked immunoassay, using an antidigoxygenin-alkaline phosphatase conjugate and the substrate AMPPD [3-(2Ј-spiroademantane)-4-methoxy-4-(3Љ-phosphoryloxy)phenyl-1,2-dioxethane; Boehringer Mannheim]. The light emitted by the dephosphorylated AMPPD was detected by X-ray film. Production of rabbit anti-PefA serum. The nucleotide sequence of a DNA region encoding PE fimbriae which has been reported recently (7) was used to design primers for PCR amplification of pefA. A DNA fragment encoding the C-terminal 167 amino acids of PefA was amplified by using the primers 5Ј-GGGAATTCTTGCTTCCATTATTGCACTGGG-3Ј and 5Ј-TCTGTCGACG GGGGATTATTTGTAAGCCACT-3Ј. The 520-bp PCR product was digested with EcoRI and SalI and cloned into the expression vector pGEX-4T-1 to create an in-frame translational fusion with the N terminus of gluthathione S-transferase and amino acids 6 to 172 of PefA. Purification of the glutathione Stransferase-PefA fusion protein from sonic lysates was performed by using a gluthathione-Sepharose affinity matrix (Pharmacia). The purified fusion protein was used to produce antiserum by injecting a rabbit subcutaneously at six dif-
PLOS ONE, Feb 24, 2010
Large, recently-available genomic databases cover a wide range of life forms, suggesting opportun... more Large, recently-available genomic databases cover a wide range of life forms, suggesting opportunity for insights into genetic structure of biodiversity. In this study we refine our recently-described technique using indicator vectors to analyze and visualize nucleotide sequences. The indicator vector approach generates correlation matrices, dubbed Klee diagrams, which represent a novel way of assembling and viewing large genomic datasets. To explore its potential utility, here we apply the improved algorithm to a collection of almost 17000 DNA barcode sequences covering 12 widely-separated animal taxa, demonstrating that indicator vectors for classification gave correct assignment in all 11000 test cases. Indicator vector analysis revealed discontinuities corresponding to species-and higher-level taxonomic divisions, suggesting an efficient approach to classification of organisms from poorly-studied groups. As compared to standard distance metrics, indicator vectors preserve diagnostic character probabilities, enable automated classification of test sequences, and generate highinformation density single-page displays. These results support application of indicator vectors for comparative analysis of large nucleotide data sets and raise prospect of gaining insight into broad-scale patterns in the genetic structure of biodiversity.
Vaccine, Feb 1, 1999
Enteropathogenic Escherichia coli (EPEC) is a major cause of childhood diarrhea in developing cou... more Enteropathogenic Escherichia coli (EPEC) is a major cause of childhood diarrhea in developing countries and is a leading cause of severe diarrheal illness among Brazilian infants. As one approach to constructing a vaccine candidate against diarrhea caused by EPEC, we evaluated whether the pilin subunit (BfpA) of the bundle-forming pilus (BFP) could be expressed by a live Salmonella vaccine strain. Several copies of the coding region of BfpA (bfpA) were ampli®ed by PCR from a preparation of the EAF plasmid of EPEC strain B171 and cloned into plasmid vectors. An intact copy of bfpA was subcloned into the heat inducible prokaryotic expression vector pCYTEXP1, and the resulting pBfpA was used to transform the aroA S. typhimurium strain SL3261, generating SL3261(pBfpA). The recombinant vaccine strain was able to express, but not to process, rBfpA as evidenced by a prominent 21 kDa protein that crossreacted with anti-BFP antiserum found only in extracts of heat-treated SL3261(pBfpA), but not in strains of untreated SL3261(pBfpA) or SL3261 not carrying the plasmid. Furthermore, rBfpA accumulation was not toxic to the Salmonella host, as evidenced by similar plating eciencies between induced and uninduced strains of SL3261(pBfpA). Finally, SL3261(pBfpA) orally administered to BALB/c mice was capable of eliciting a sustained and vigorous humoral immune response to BfpA, achievable even with a single oral dose of approximately 10 9 organisms. Therefore, this pilin product may serve as a potential immunogen as part of a live combined vaccine strategy to prevent two of the major public health problems in Brazil ± salmonellosis and EPEC childhood diahrrea.
PLOS ONE, Oct 2, 2009
Background: Comparative DNA sequence analysis provides insight into evolution and helps construct... more Background: Comparative DNA sequence analysis provides insight into evolution and helps construct a natural classification reflecting the Tree of Life. The growing numbers of organisms represented in DNA databases challenge treebuilding techniques and the vertical hierarchical classification may obscure relationships among some groups. Approaches that can incorporate sequence data from large numbers of taxa and enable visualization of affinities across groups are desirable. Methodology/Principal Findings: Toward this end, we developed a procedure for extracting diagnostic patterns in the form of indicator vectors from DNA sequences of taxonomic groups. In the present instance the indicator vectors were derived from mitochondrial cytochrome c oxidase I (COI) sequences of those groups and further analyzed on this basis. In the first example, indicator vectors for birds, fish, and butterflies were constructed from a training set of COI sequences, then correlations with test sequences not used to construct the indicator vector were determined. In all cases, correlation with the indicator vector correctly assigned test sequences to their proper group. In the second example, this approach was explored at the species level within the bird grouping; this also gave correct assignment, suggesting the possibility of automated procedures for classification at various taxonomic levels. A false-color matrix of vector correlations displayed affinities among species consistent with higher-order taxonomy. Conclusions/Significance: The indicator vectors preserved DNA character information and provided quantitative measures of correlations among taxonomic groups. This method is scalable to the largest datasets envisioned in this field, provides a visually-intuitive display that captures relational affinities derived from sequence data across a diversity of life forms, and is potentially a useful complement to current tree-building techniques for studying evolutionary processes based on DNA sequence data.
Proceedings of the National Academy of Sciences of the United States of America, May 1, 1987
The nucleotide sequences of the 3' noncoding regions of all eight segments of influenza B virus R... more The nucleotide sequences of the 3' noncoding regions of all eight segments of influenza B virus RNA and the sequences of the 5' noncoding regions of segments 4-8 were determined in virus strains isolated over a period of 40 years. Nearly complete conservation of the noncoding sequences was found. Nine nucleotides at the 3' termini and 11 nucleotides at the 5' termini were common to all segments examined. In the region immediately adjacent to the common 3' terminal region, the nucleotides were specific for each segment and these segment-specific sequences were conserved in all strains exam- ined. In each of the five segments in which both termini were examined, the segment-specific 3' sequences exhibited perfect inverted complementarity to a segment-specific sequence ad- jacent to the common 5' terminus. In addition, in the 3' noncoding region of RNA segments 1-3, which encode proteins involved in RNA synthesis, a single nucleotide substitution at position 10 was found that distinguishes these segments from segments 4-8. Comparison of these data with published reports has revealed that some of the features found in the noncoding regions of influenza B virus are also present in influenza A and C virus RNAs. In the RNAs of all three virus types, there is a segment-specific sequence of nucleotides near the 3' terminus that shows inverted complementarity to a sequence near the 5' terminus. This segment-specific sequence may play a role in the transcription of individual segments or in sorting of segments during virion assembly.
Scientific Reports, Sep 11, 2013
Indicator vector analysis of a nucleotide sequence alignment generates a compact heat map, called... more Indicator vector analysis of a nucleotide sequence alignment generates a compact heat map, called a Klee diagram, with potential insight into clustering patterns in evolution. However, so far this approach has examined only mitochondrial cytochrome c oxidase I (COI) DNA barcode sequences. To further explore, we developed TreeParser, a freely-available web-based program that sorts a sequence alignment according to a phylogenetic tree generated from the dataset. We applied TreeParser to nuclear gene and COI barcode alignments from birds and butterflies. Distinct blocks in the resulting Klee diagrams corresponded to species and higher-level taxonomic divisions in both groups, and this enabled graphic comparison of phylogenetic information in nuclear and mitochondrial genes. Our results demonstrate TreeParser-aided Klee diagrams objectively display taxonomic clusters in nucleotide sequence alignments. This approach may help establish taxonomy in poorly studied groups and investigate higher-level clustering which appears widespread but not well understood. omparing nucleotide sequences from different organisms helps understand evolution. Applications range from reconstructing the earliest branches on the Tree of Life to mapping the routes and timing of human expansion out of Africa 1-3 . Standard approaches evaluate homologous nucleotide or amino acid positions across a sequence alignment to infer the probable order of divergences, and display results in a tree diagram of evolutionary history . Phylogenetic methods generally emphasize branching order-the sequence of events along each branch-and less so timing across divisions. As a result, coincident divergences involving multiple boughs may be overlooked. Specific methods designed to detect clustering have been applied to species delimitation and viral evolution . This relatively limited focus to date likely reflects the commonly-held view that higher taxa are arbitrary demarcations of the taxonomic hierarchy rather than indicators of evolutionary processes . Matrix heat maps help visualize clustering in complex datasets and can compress hundreds of thousands of data points into single-page displays . Applications range from evaluating social networks to identifying diagnostic gene expression profiles in tumors and brain scan patterns associated with schizophrenia . Matrix rows and columns are sorted, typically by hierarchical clustering, and the rearranged matrix is colorized as a heat map. Clusters of correlated inputs show up as ''hot blocks'' along the diagonal. Matrices may be asymmetric, e.g., a gene expression profile with genes sorted along one axis and cell types along the other, or symmetric, with identical inputs along both axes (e.g. . A symmetric matrix heat map approach to comparative nucleotide sequence analysis using indicator vector correlations is recently described . Indicator vectors are digital transformations of nucleotide sequences in vector space; correlations are roughly inversely proportional to p-distances. Unlike simple pdistance methods, scaling of correlations is relative rather than absolute and vectors can represent multiple sequences. Indicator vector analysis generates a Klee diagram, a colorized heat map of the correlation matrix. Taxonomy-ordered Klee diagrams may offer new insights into evolution . However, to date this approach has only been applied to mitochondrial COI barcode sequences and is limited by the need for an accurate taxonomic list which is not readily available for most groups. Here we describe TreeParser, a web-based software that sorts a nucleotide sequence alignment according to a phylogenetic tree generated from the same dataset, facilitating an otherwise time-consuming step in this analytic pipeline. To assess potential utility, we apply TreeParser-indicator vector analysis to mitochondrial and nuclear gene datasets and examine clustering in the resulting Klee diagrams.
ABSTRACTSingle-species PCR assays accurately measure eDNA concentration. Here we test whether mul... more ABSTRACTSingle-species PCR assays accurately measure eDNA concentration. Here we test whether multi-species PCR, i.e., metabarcoding, with an internal standard can quantify eDNA of marine bony fish. Replicate amplifications with Riaz 12S gene primers were spiked with known amounts of a non-fish vertebrate DNA standard, indexed separately, and sequenced on an Illumina MiSeq. Fish eDNA copies were calculated by comparing fish and standard reads. Relative reads were directly proportional to relative DNA copies, with average and maximum variance between replicates of about 1.3- and 2.0-fold, respectively. There was an apparent threshold for consistent amplification of about 10 eDNA copies per PCR reaction. The internal DNA standard corrected for distortion of read counts due to non-fish vertebrate DNA. To assess potential amplification bias among species, we compared reads obtained with Riaz 12S primers to those with modified MiFish primers. Our results provide evidence that Riaz 12S ge...
Delivery of Human Interferon-γ via Gene TransferIn Vitro:Prolonged Expression and Induction of Macrophage Antimicrobial Activity
Journal of Interferon & Cytokine Research, 1996
Daily parenteral administration of exogenous interferon-gamma (IFN-gamma) induces or accelerates ... more Daily parenteral administration of exogenous interferon-gamma (IFN-gamma) induces or accelerates recovery in experimental and human infections. To develop an alternative delivery system, a replication-defective recombinant adenovirus expressing human IFN-gamma was constructed. The complete coding region of IFN-gamma was amplified by RT-PCR and inserted into an adenovirus cloning vector under the control of a human cytomegalovirus promoter. Recombinant adenovirus containing the IFN-gamma minigene (dAv-IFN-gamma) was isolated from 293 cells co-transfected with the linearized plasmid and an E1 region-deleted fragment of adenovirus genome. Following in vitro infection with dAv-IFN-gamma, dose-dependent and time-dependent expression of IFN-gamma, mRNA and production of soluble protein were demonstrated in human diploid fibroblat and HeLa cell cultures by Northern blot and ELISA, respectively. Extracellular protein secretion persisted for > = 4 weeks following initial transfection, and secreted IFN-gamma induced both antiviral activity (8000-25,000 U/ml) and macrophage activation with killing of intracellular Toxoplasma gondii and leishmania donovani. These results establish that dAv-IFN-gamma generates long-term secretion of biologically active IFN-gamma in vitro and suggest that this vector may be a useful delivery system for cytokine therapy.
protocols.io, 2018
We are still developing and optimizing these protocols! The aquatic vertebrate eDNA protocols are... more We are still developing and optimizing these protocols! The aquatic vertebrate eDNA protocols are designed for persons familiar with basic molecular biology techniques and access to essential molecular biology laboratory equipment. To facilitate use, we utilize commercial kits and open source software, and standardized PCR and sequencing protocols.
Infectious diseases
JAMA: The Journal of the American Medical Association, 1993
Infectious diseases
JAMA: The Journal of the American Medical Association, 1992
Infectious diseases
JAMA: The Journal of the American Medical Association, 1995
Infectious Diseases
JAMA: The Journal of the American Medical Association, 1995
ABSTRACT
Infectious Diseases
JAMA: The Journal of the American Medical Association, 1996
ABSTRACT
Tuberculosis Among Urban Health Care Workers: A Study Using Restriction Fragment Length Polymorphism Typing
Clinical Infectious Diseases, 1995
Cases of tuberculosis identified during 1992-1994 through an active tuberculosis surveillance net... more Cases of tuberculosis identified during 1992-1994 through an active tuberculosis surveillance network among six hospitals that serve New York City (the TBNetwork) were analyzed according to the occupational status of the patients. Clinical data were obtained by review of medical records, and restriction fragment length polymorphism (RFLP) typing of Mycobacterium tuberculosis isolates was performed. No known nosocomial outbreaks of tuberculosis occurred at these hospitals in the study period. Occupational status was known for 142 of 201 patients whose isolates were available for strain typing. Patients infected by organisms with a clustered strain typing pattern, as determined by RFLP analysis, were presumed to have recently acquired disease. RFLP typing revealed that isolates from 13 (65%) of 20 health care workers and 50 (41%) of 122 non-health care workers had a clustered RFLP pattern. The strains infecting eight (89%) of nine health care workers seropositive for human immunodeficiency virus (HIV) had a clustered RFLP pattern. Multivariate analysis of 75 patients with known HIV and occupational status revealed that HIV status (P = .03) and health care worker status (P = .02; RR = 2.77) were independent risk factors for a clustered RFLP strain. These findings suggest that many of the apparently sporadic cases of tuberculosis among health care workers may be due to unrecognized occupational transmission.
The Spectrum of Human Herpesvirus 6 Infection: From Roseola Infantum to Adult Disease
Annual Review of Medicine, 2000
▪ Human herpesvirus 6 is the causative agent of roseola infantum, a generally benign rash illne... more ▪ Human herpesvirus 6 is the causative agent of roseola infantum, a generally benign rash illness of infants. Most persons acquire HHV-6 infection by age 2 years, and HHV-6 infection is a common cause of fever and febrile seizures in infants. In adults, primary infection with HHV-6 can produce a mononucleosis-like illness and, more rarely, severe disease, including encephalitis. In addition to primary infections, HHV-6 can cause clinical illness during reactivation, particularly in immunocompromised persons.
PLOS Biology, Sep 28, 2004
Short DNA sequences from a standardized region of the genome provide a DNA barcode for identifyin... more Short DNA sequences from a standardized region of the genome provide a DNA barcode for identifying species. Compiling a public library of DNA barcodes linked to named specimens could provide a new master key for identifying species, one whose power will rise with increased taxon coverage and with faster, cheaper sequencing. Recent work suggests that sequence diversity in a 648-bp region of the mitochondrial gene, cytochrome c oxidase I (COI), might serve as a DNA barcode for the identification of animal species. This study tested the effectiveness of a COI barcode in discriminating bird species, one of the largest and best-studied vertebrate groups. We determined COI barcodes for 260 species of North American birds and found that distinguishing species was generally straightforward. All species had a different COI barcode(s), and the differences between closely related species were, on average, 18 times higher than the differences within species. Our results identified four probable new species of North American birds, suggesting that a global survey will lead to the recognition of many additional bird species. The finding of large COI sequence differences between, as compared to small differences within, species confirms the effectiveness of COI barcodes for the identification of bird species. This result plus those from other groups of animals imply that a standard screening threshold of sequence difference (103 average intraspecific difference) could speed the discovery of new animal species. The growing evidence for the effectiveness of DNA barcodes as a basis for species identification supports an international exercise that has recently begun to assemble a comprehensive library of COI sequences linked to named specimens.
PLOS ONE, Aug 27, 2012
The accuracy of DNA barcode databases is critical for research and practical applications. Here w... more The accuracy of DNA barcode databases is critical for research and practical applications. Here we apply a frequency matrix to assess sequencing errors in a very large set of avian BARCODEs. Using 11,000 sequences from 2,700 bird species, we show most avian cytochrome c oxidase I (COI) nucleotide and amino acid sequences vary within a narrow range. Except for third codon positions, nearly all (96%) sites were highly conserved or limited to two nucleotides or two amino acids. A large number of positions had very low frequency variants present in single individuals of a species; these were strongly concentrated at the ends of the barcode segment, consistent with sequencing error. In addition, a small fraction (0.1%) of BARCODEs had multiple very low frequency variants shared among individuals of a species; these were found to represent overlooked cryptic pseudogenes lacking stop codons. The calculated upper limit of sequencing error was 8610 25 errors/ nucleotide, which was relatively high for direct Sanger sequencing of amplified DNA, but unlikely to compromise species identification. Our results confirm the high quality of the avian BARCODE database and demonstrate significant quality improvement in avian COI records deposited in GenBank over the past decade. This approach has potential application for genetic database quality control, discovery of cryptic pseudogenes, and studies of low-level genetic variation.
Frontiers in Zoology, 2010
Background: The identification of vast numbers of unknown organisms using DNA sequences becomes m... more Background: The identification of vast numbers of unknown organisms using DNA sequences becomes more and more important in ecological and biodiversity studies. In this context, a fragment of the mitochondrial cytochrome c oxidase I (COI) gene has been proposed as standard DNA barcoding marker for the identification of organisms. Limitations of the COI barcoding approach can arise from its single-locus identification system, the effect of introgression events, incomplete lineage sorting, numts, heteroplasmy and maternal inheritance of intracellular endosymbionts. Consequently, the analysis of a supplementary nuclear marker system could be advantageous. Results: We tested the effectiveness of the COI barcoding region and of three nuclear ribosomal expansion segments in discriminating ground beetles of Central Europe, a diverse and well-studied invertebrate taxon. As nuclear markers we determined the 18S rDNA: V4, 18S rDNA: V7 and 28S rDNA: D3 expansion segments for 344 specimens of 75 species. Seventy-three species (97%) of the analysed species could be accurately identified using COI, while the combined approach of all three nuclear markers provided resolution among 71 (95%) of the studied Carabidae. Our results confirm that the analysed nuclear ribosomal expansion segments in combination constitute a valuable and efficient supplement for classical DNA barcoding to avoid potential pitfalls when only mitochondrial data are being used. We also demonstrate the high potential of COI barcodes for the identification of even closely related carabid species.
bioRxiv (Cold Spring Harbor Laboratory), Apr 22, 2022
Relating environmental DNA (eDNA) signal strength to organism abundance requires a fundamental un... more Relating environmental DNA (eDNA) signal strength to organism abundance requires a fundamental understanding of eDNA production. A number of studies have demonstrated that eDNA production may scale allometrically -that is, larger organisms tend to exhibit lower mass-specific eDNA production rates, likely due to allometric scaling in key processes related to eDNA production (e.g. surface area, excretion/egestion). While most previous studies have examined intra-specific allometry, physiological rates and organism surface area also scale allometrically across species. We therefore hypothesize that eDNA production will similarly exhibit inter-specific allometric scaling. To evaluate this hypothesis, we reanalyzed previously published eDNA data from Stoeckle et al. ( ) which compared metabarcoding read count to organism count and biomass data obtained from trawl surveys. Using a Bayesian model we empirically estimated the value of the allometric scaling coefficient ('b') for bony fishes to be 0.67 (credible interval = 0.58 -0.77), although our model failed to converge for chondrichthyan species. We found that integrating allometry significantly improved correlations between organism abundance and metabarcoding read count relative to traditional metrics of abundance (density and biomass) for bony fishes. Although substantial unexplained variation remains in the relationship between read count and organism abundance, our study provides evidence that eDNA production tends to scale allometrically across species. Future studies investigating the relationship between eDNA signal strength and metrics of fish abundance could potentially be improved by accounting for allometry -a scaling coefficient value of ~2/3 appears to be both theoretically and empirically justified. .
Infection and Immunity, 1996
We investigated the role of the pef operon, containing the genes for plasmid-encoded (PE) fimbria... more We investigated the role of the pef operon, containing the genes for plasmid-encoded (PE) fimbriae of Salmonella typhimurium, in adhesion to the murine small intestine. In an organ culture model, a mutant of S. typhimurium carrying a tetracycline resistance cassette inserted in pefC was found to be associated in lower numbers with murine small intestine than the wild type. Similarly, heterologous expression of PE fimbriae in Escherichia coli increased the bacterial numbers recovered from the intestine in the organ culture model. PE fimbriae was further demonstrated by binding of an E. coli strain expressing PE fimbriae to thin sections of mouse small intestine. The contribution of pef-mediated adhesion on fluid accumulation was investigated in infant mice. Intragastric injection of S. typhimurium 14028 and SR-11 caused fluid accumulation in infant mice. In contrast, pefC mutants of S. typhimurium 14028 and SR-11 were negative in the infant mouse assay. Introduction of a plasmid containing pefBACD and orf5, the first five genes of the pef operon, into the pefC mutant complemented for fluid accumulation in the infant mouse assay. However, heterologous expression of PE fimbriae in E. coli did not result in fluid accumulation in the infant mouse, suggesting that factors other than fimbriae are involved in causing fluid accumulation. Salmonella typhimurium is the most common cause of acute gastroenteritis in humans in the United States. However, the mechanism by which S. typhimurium causes diarrhea in humans is not well defined. Although at least three different toxic activities of S. typhimurium have been found in several animal and cell culture models, their contribution to the generation of diarrhea in humans has never been conclusively demonstrated . In fact, salmonellosis appears to be a complex, multifactorial process (43), and the ability of S. typhimurium to multiply in the lamina propria and cause inflammation may contribute significantly to diarrheal disease . Bacterial adhesins are known to support colonization of the host's alimentary tract, thereby increasing the bacterial load in proximity to the epithelial lining. As a consequence, fimbriae of enterotoxigenic Escherichia coli and Vibrio cholerae are necessary for diarrhea . Although several fimbrial adhesins have been found in S. typhimurium (1), fimbriae have so far not been implicated in fluid accumulation in animal models. In this report, we present evidence that plasmid-encoded (PE) fimbriae of S. typhimurium mediate adhesion to mouse small intestine and are necessary for fluid accumulation in the infant mouse assay. Bacterial strains, cell lines, and growth conditions. Bacterial strains used in this study are listed in Table . All bacteria were cultured in Luria-Bertani broth (LB; 5 g of yeast extract, 10 g of tryptone, and 10 g of NaCl per liter) or on plates (LB broth containing 15 g of agar per liter) at 37ЊC. Antibiotics, when required, were included in the culture medium or plates at the following concentrations: carbenicillin, 100 mg/liter; kanamycin, 60 mg/liter; nalidixic acid, 50 mg/liter; chloramphenicol, 30 mg/liter; and tetracycline, 10 mg/liter. HeLa and T84 cells were cultivated in Dulbecco's modified Eagle's medium (GIBCO) supplemented with 10% heat-inactivated fetal calf serum (GIBCO), 1% nonessential amino acids, and 1 mM glutamine (DMEMsup). For adhesion assays, 24-well microtiter plates were seeded with HeLa or T84 cells at a concentration of 5 ϫ 10 5 cells per well in 0.5 ml of DMEMsup and incubated overnight at 37ЊC in 5% CO 2 . Analytical-grade chemicals were purchased from Sigma. All enzymes were purchased from Boehringer Mannheim. Recombinant DNA and genetic techniques. Plasmid DNA was isolated by using ion-exchange columns from Qiagen. Standard methods were used for restriction endonuclease analyses, ligation and transformation of plasmid DNA, transfer of plasmid DNA by conjugation, and isolation of chromosomal DNA from bacteria . Plasmids were constructed by using the vector pBluescript SKϩ (40) or the suicide vector pEP185.2 (21). Southern transfer of DNA onto a nylon membrane was performed as previously described (27). Labeling of DNA probes, hybridization, and immunological detection were done by using the DNA labeling and detection kit (nonradioactive) from Boehringer Mannheim. The DNA was labeled by random-primed incorporation of digoxygenin-labeled dUTP. Hybridization was performed at 65ЊC in solutions without formamide. Hybrids were detected by an enzymelinked immunoassay, using an antidigoxygenin-alkaline phosphatase conjugate and the substrate AMPPD [3-(2Ј-spiroademantane)-4-methoxy-4-(3Љ-phosphoryloxy)phenyl-1,2-dioxethane; Boehringer Mannheim]. The light emitted by the dephosphorylated AMPPD was detected by X-ray film. Production of rabbit anti-PefA serum. The nucleotide sequence of a DNA region encoding PE fimbriae which has been reported recently (7) was used to design primers for PCR amplification of pefA. A DNA fragment encoding the C-terminal 167 amino acids of PefA was amplified by using the primers 5Ј-GGGAATTCTTGCTTCCATTATTGCACTGGG-3Ј and 5Ј-TCTGTCGACG GGGGATTATTTGTAAGCCACT-3Ј. The 520-bp PCR product was digested with EcoRI and SalI and cloned into the expression vector pGEX-4T-1 to create an in-frame translational fusion with the N terminus of gluthathione S-transferase and amino acids 6 to 172 of PefA. Purification of the glutathione Stransferase-PefA fusion protein from sonic lysates was performed by using a gluthathione-Sepharose affinity matrix (Pharmacia). The purified fusion protein was used to produce antiserum by injecting a rabbit subcutaneously at six dif-
PLOS ONE, Feb 24, 2010
Large, recently-available genomic databases cover a wide range of life forms, suggesting opportun... more Large, recently-available genomic databases cover a wide range of life forms, suggesting opportunity for insights into genetic structure of biodiversity. In this study we refine our recently-described technique using indicator vectors to analyze and visualize nucleotide sequences. The indicator vector approach generates correlation matrices, dubbed Klee diagrams, which represent a novel way of assembling and viewing large genomic datasets. To explore its potential utility, here we apply the improved algorithm to a collection of almost 17000 DNA barcode sequences covering 12 widely-separated animal taxa, demonstrating that indicator vectors for classification gave correct assignment in all 11000 test cases. Indicator vector analysis revealed discontinuities corresponding to species-and higher-level taxonomic divisions, suggesting an efficient approach to classification of organisms from poorly-studied groups. As compared to standard distance metrics, indicator vectors preserve diagnostic character probabilities, enable automated classification of test sequences, and generate highinformation density single-page displays. These results support application of indicator vectors for comparative analysis of large nucleotide data sets and raise prospect of gaining insight into broad-scale patterns in the genetic structure of biodiversity.
Vaccine, Feb 1, 1999
Enteropathogenic Escherichia coli (EPEC) is a major cause of childhood diarrhea in developing cou... more Enteropathogenic Escherichia coli (EPEC) is a major cause of childhood diarrhea in developing countries and is a leading cause of severe diarrheal illness among Brazilian infants. As one approach to constructing a vaccine candidate against diarrhea caused by EPEC, we evaluated whether the pilin subunit (BfpA) of the bundle-forming pilus (BFP) could be expressed by a live Salmonella vaccine strain. Several copies of the coding region of BfpA (bfpA) were ampli®ed by PCR from a preparation of the EAF plasmid of EPEC strain B171 and cloned into plasmid vectors. An intact copy of bfpA was subcloned into the heat inducible prokaryotic expression vector pCYTEXP1, and the resulting pBfpA was used to transform the aroA S. typhimurium strain SL3261, generating SL3261(pBfpA). The recombinant vaccine strain was able to express, but not to process, rBfpA as evidenced by a prominent 21 kDa protein that crossreacted with anti-BFP antiserum found only in extracts of heat-treated SL3261(pBfpA), but not in strains of untreated SL3261(pBfpA) or SL3261 not carrying the plasmid. Furthermore, rBfpA accumulation was not toxic to the Salmonella host, as evidenced by similar plating eciencies between induced and uninduced strains of SL3261(pBfpA). Finally, SL3261(pBfpA) orally administered to BALB/c mice was capable of eliciting a sustained and vigorous humoral immune response to BfpA, achievable even with a single oral dose of approximately 10 9 organisms. Therefore, this pilin product may serve as a potential immunogen as part of a live combined vaccine strategy to prevent two of the major public health problems in Brazil ± salmonellosis and EPEC childhood diahrrea.
PLOS ONE, Oct 2, 2009
Background: Comparative DNA sequence analysis provides insight into evolution and helps construct... more Background: Comparative DNA sequence analysis provides insight into evolution and helps construct a natural classification reflecting the Tree of Life. The growing numbers of organisms represented in DNA databases challenge treebuilding techniques and the vertical hierarchical classification may obscure relationships among some groups. Approaches that can incorporate sequence data from large numbers of taxa and enable visualization of affinities across groups are desirable. Methodology/Principal Findings: Toward this end, we developed a procedure for extracting diagnostic patterns in the form of indicator vectors from DNA sequences of taxonomic groups. In the present instance the indicator vectors were derived from mitochondrial cytochrome c oxidase I (COI) sequences of those groups and further analyzed on this basis. In the first example, indicator vectors for birds, fish, and butterflies were constructed from a training set of COI sequences, then correlations with test sequences not used to construct the indicator vector were determined. In all cases, correlation with the indicator vector correctly assigned test sequences to their proper group. In the second example, this approach was explored at the species level within the bird grouping; this also gave correct assignment, suggesting the possibility of automated procedures for classification at various taxonomic levels. A false-color matrix of vector correlations displayed affinities among species consistent with higher-order taxonomy. Conclusions/Significance: The indicator vectors preserved DNA character information and provided quantitative measures of correlations among taxonomic groups. This method is scalable to the largest datasets envisioned in this field, provides a visually-intuitive display that captures relational affinities derived from sequence data across a diversity of life forms, and is potentially a useful complement to current tree-building techniques for studying evolutionary processes based on DNA sequence data.
Proceedings of the National Academy of Sciences of the United States of America, May 1, 1987
The nucleotide sequences of the 3' noncoding regions of all eight segments of influenza B virus R... more The nucleotide sequences of the 3' noncoding regions of all eight segments of influenza B virus RNA and the sequences of the 5' noncoding regions of segments 4-8 were determined in virus strains isolated over a period of 40 years. Nearly complete conservation of the noncoding sequences was found. Nine nucleotides at the 3' termini and 11 nucleotides at the 5' termini were common to all segments examined. In the region immediately adjacent to the common 3' terminal region, the nucleotides were specific for each segment and these segment-specific sequences were conserved in all strains exam- ined. In each of the five segments in which both termini were examined, the segment-specific 3' sequences exhibited perfect inverted complementarity to a segment-specific sequence ad- jacent to the common 5' terminus. In addition, in the 3' noncoding region of RNA segments 1-3, which encode proteins involved in RNA synthesis, a single nucleotide substitution at position 10 was found that distinguishes these segments from segments 4-8. Comparison of these data with published reports has revealed that some of the features found in the noncoding regions of influenza B virus are also present in influenza A and C virus RNAs. In the RNAs of all three virus types, there is a segment-specific sequence of nucleotides near the 3' terminus that shows inverted complementarity to a sequence near the 5' terminus. This segment-specific sequence may play a role in the transcription of individual segments or in sorting of segments during virion assembly.
Scientific Reports, Sep 11, 2013
Indicator vector analysis of a nucleotide sequence alignment generates a compact heat map, called... more Indicator vector analysis of a nucleotide sequence alignment generates a compact heat map, called a Klee diagram, with potential insight into clustering patterns in evolution. However, so far this approach has examined only mitochondrial cytochrome c oxidase I (COI) DNA barcode sequences. To further explore, we developed TreeParser, a freely-available web-based program that sorts a sequence alignment according to a phylogenetic tree generated from the dataset. We applied TreeParser to nuclear gene and COI barcode alignments from birds and butterflies. Distinct blocks in the resulting Klee diagrams corresponded to species and higher-level taxonomic divisions in both groups, and this enabled graphic comparison of phylogenetic information in nuclear and mitochondrial genes. Our results demonstrate TreeParser-aided Klee diagrams objectively display taxonomic clusters in nucleotide sequence alignments. This approach may help establish taxonomy in poorly studied groups and investigate higher-level clustering which appears widespread but not well understood. omparing nucleotide sequences from different organisms helps understand evolution. Applications range from reconstructing the earliest branches on the Tree of Life to mapping the routes and timing of human expansion out of Africa 1-3 . Standard approaches evaluate homologous nucleotide or amino acid positions across a sequence alignment to infer the probable order of divergences, and display results in a tree diagram of evolutionary history . Phylogenetic methods generally emphasize branching order-the sequence of events along each branch-and less so timing across divisions. As a result, coincident divergences involving multiple boughs may be overlooked. Specific methods designed to detect clustering have been applied to species delimitation and viral evolution . This relatively limited focus to date likely reflects the commonly-held view that higher taxa are arbitrary demarcations of the taxonomic hierarchy rather than indicators of evolutionary processes . Matrix heat maps help visualize clustering in complex datasets and can compress hundreds of thousands of data points into single-page displays . Applications range from evaluating social networks to identifying diagnostic gene expression profiles in tumors and brain scan patterns associated with schizophrenia . Matrix rows and columns are sorted, typically by hierarchical clustering, and the rearranged matrix is colorized as a heat map. Clusters of correlated inputs show up as ''hot blocks'' along the diagonal. Matrices may be asymmetric, e.g., a gene expression profile with genes sorted along one axis and cell types along the other, or symmetric, with identical inputs along both axes (e.g. . A symmetric matrix heat map approach to comparative nucleotide sequence analysis using indicator vector correlations is recently described . Indicator vectors are digital transformations of nucleotide sequences in vector space; correlations are roughly inversely proportional to p-distances. Unlike simple pdistance methods, scaling of correlations is relative rather than absolute and vectors can represent multiple sequences. Indicator vector analysis generates a Klee diagram, a colorized heat map of the correlation matrix. Taxonomy-ordered Klee diagrams may offer new insights into evolution . However, to date this approach has only been applied to mitochondrial COI barcode sequences and is limited by the need for an accurate taxonomic list which is not readily available for most groups. Here we describe TreeParser, a web-based software that sorts a nucleotide sequence alignment according to a phylogenetic tree generated from the same dataset, facilitating an otherwise time-consuming step in this analytic pipeline. To assess potential utility, we apply TreeParser-indicator vector analysis to mitochondrial and nuclear gene datasets and examine clustering in the resulting Klee diagrams.
ABSTRACTSingle-species PCR assays accurately measure eDNA concentration. Here we test whether mul... more ABSTRACTSingle-species PCR assays accurately measure eDNA concentration. Here we test whether multi-species PCR, i.e., metabarcoding, with an internal standard can quantify eDNA of marine bony fish. Replicate amplifications with Riaz 12S gene primers were spiked with known amounts of a non-fish vertebrate DNA standard, indexed separately, and sequenced on an Illumina MiSeq. Fish eDNA copies were calculated by comparing fish and standard reads. Relative reads were directly proportional to relative DNA copies, with average and maximum variance between replicates of about 1.3- and 2.0-fold, respectively. There was an apparent threshold for consistent amplification of about 10 eDNA copies per PCR reaction. The internal DNA standard corrected for distortion of read counts due to non-fish vertebrate DNA. To assess potential amplification bias among species, we compared reads obtained with Riaz 12S primers to those with modified MiFish primers. Our results provide evidence that Riaz 12S ge...
Delivery of Human Interferon-γ via Gene TransferIn Vitro:Prolonged Expression and Induction of Macrophage Antimicrobial Activity
Journal of Interferon & Cytokine Research, 1996
Daily parenteral administration of exogenous interferon-gamma (IFN-gamma) induces or accelerates ... more Daily parenteral administration of exogenous interferon-gamma (IFN-gamma) induces or accelerates recovery in experimental and human infections. To develop an alternative delivery system, a replication-defective recombinant adenovirus expressing human IFN-gamma was constructed. The complete coding region of IFN-gamma was amplified by RT-PCR and inserted into an adenovirus cloning vector under the control of a human cytomegalovirus promoter. Recombinant adenovirus containing the IFN-gamma minigene (dAv-IFN-gamma) was isolated from 293 cells co-transfected with the linearized plasmid and an E1 region-deleted fragment of adenovirus genome. Following in vitro infection with dAv-IFN-gamma, dose-dependent and time-dependent expression of IFN-gamma, mRNA and production of soluble protein were demonstrated in human diploid fibroblat and HeLa cell cultures by Northern blot and ELISA, respectively. Extracellular protein secretion persisted for > = 4 weeks following initial transfection, and secreted IFN-gamma induced both antiviral activity (8000-25,000 U/ml) and macrophage activation with killing of intracellular Toxoplasma gondii and leishmania donovani. These results establish that dAv-IFN-gamma generates long-term secretion of biologically active IFN-gamma in vitro and suggest that this vector may be a useful delivery system for cytokine therapy.
protocols.io, 2018
We are still developing and optimizing these protocols! The aquatic vertebrate eDNA protocols are... more We are still developing and optimizing these protocols! The aquatic vertebrate eDNA protocols are designed for persons familiar with basic molecular biology techniques and access to essential molecular biology laboratory equipment. To facilitate use, we utilize commercial kits and open source software, and standardized PCR and sequencing protocols.
Infectious diseases
JAMA: The Journal of the American Medical Association, 1993
Infectious diseases
JAMA: The Journal of the American Medical Association, 1992
Infectious diseases
JAMA: The Journal of the American Medical Association, 1995
Infectious Diseases
JAMA: The Journal of the American Medical Association, 1995
ABSTRACT
Infectious Diseases
JAMA: The Journal of the American Medical Association, 1996
ABSTRACT
Tuberculosis Among Urban Health Care Workers: A Study Using Restriction Fragment Length Polymorphism Typing
Clinical Infectious Diseases, 1995
Cases of tuberculosis identified during 1992-1994 through an active tuberculosis surveillance net... more Cases of tuberculosis identified during 1992-1994 through an active tuberculosis surveillance network among six hospitals that serve New York City (the TBNetwork) were analyzed according to the occupational status of the patients. Clinical data were obtained by review of medical records, and restriction fragment length polymorphism (RFLP) typing of Mycobacterium tuberculosis isolates was performed. No known nosocomial outbreaks of tuberculosis occurred at these hospitals in the study period. Occupational status was known for 142 of 201 patients whose isolates were available for strain typing. Patients infected by organisms with a clustered strain typing pattern, as determined by RFLP analysis, were presumed to have recently acquired disease. RFLP typing revealed that isolates from 13 (65%) of 20 health care workers and 50 (41%) of 122 non-health care workers had a clustered RFLP pattern. The strains infecting eight (89%) of nine health care workers seropositive for human immunodeficiency virus (HIV) had a clustered RFLP pattern. Multivariate analysis of 75 patients with known HIV and occupational status revealed that HIV status (P = .03) and health care worker status (P = .02; RR = 2.77) were independent risk factors for a clustered RFLP strain. These findings suggest that many of the apparently sporadic cases of tuberculosis among health care workers may be due to unrecognized occupational transmission.
The Spectrum of Human Herpesvirus 6 Infection: From Roseola Infantum to Adult Disease
Annual Review of Medicine, 2000
▪ Human herpesvirus 6 is the causative agent of roseola infantum, a generally benign rash illne... more ▪ Human herpesvirus 6 is the causative agent of roseola infantum, a generally benign rash illness of infants. Most persons acquire HHV-6 infection by age 2 years, and HHV-6 infection is a common cause of fever and febrile seizures in infants. In adults, primary infection with HHV-6 can produce a mononucleosis-like illness and, more rarely, severe disease, including encephalitis. In addition to primary infections, HHV-6 can cause clinical illness during reactivation, particularly in immunocompromised persons.
PLOS Biology, Sep 28, 2004
Short DNA sequences from a standardized region of the genome provide a DNA barcode for identifyin... more Short DNA sequences from a standardized region of the genome provide a DNA barcode for identifying species. Compiling a public library of DNA barcodes linked to named specimens could provide a new master key for identifying species, one whose power will rise with increased taxon coverage and with faster, cheaper sequencing. Recent work suggests that sequence diversity in a 648-bp region of the mitochondrial gene, cytochrome c oxidase I (COI), might serve as a DNA barcode for the identification of animal species. This study tested the effectiveness of a COI barcode in discriminating bird species, one of the largest and best-studied vertebrate groups. We determined COI barcodes for 260 species of North American birds and found that distinguishing species was generally straightforward. All species had a different COI barcode(s), and the differences between closely related species were, on average, 18 times higher than the differences within species. Our results identified four probable new species of North American birds, suggesting that a global survey will lead to the recognition of many additional bird species. The finding of large COI sequence differences between, as compared to small differences within, species confirms the effectiveness of COI barcodes for the identification of bird species. This result plus those from other groups of animals imply that a standard screening threshold of sequence difference (103 average intraspecific difference) could speed the discovery of new animal species. The growing evidence for the effectiveness of DNA barcodes as a basis for species identification supports an international exercise that has recently begun to assemble a comprehensive library of COI sequences linked to named specimens.
PLOS ONE, Aug 27, 2012
The accuracy of DNA barcode databases is critical for research and practical applications. Here w... more The accuracy of DNA barcode databases is critical for research and practical applications. Here we apply a frequency matrix to assess sequencing errors in a very large set of avian BARCODEs. Using 11,000 sequences from 2,700 bird species, we show most avian cytochrome c oxidase I (COI) nucleotide and amino acid sequences vary within a narrow range. Except for third codon positions, nearly all (96%) sites were highly conserved or limited to two nucleotides or two amino acids. A large number of positions had very low frequency variants present in single individuals of a species; these were strongly concentrated at the ends of the barcode segment, consistent with sequencing error. In addition, a small fraction (0.1%) of BARCODEs had multiple very low frequency variants shared among individuals of a species; these were found to represent overlooked cryptic pseudogenes lacking stop codons. The calculated upper limit of sequencing error was 8610 25 errors/ nucleotide, which was relatively high for direct Sanger sequencing of amplified DNA, but unlikely to compromise species identification. Our results confirm the high quality of the avian BARCODE database and demonstrate significant quality improvement in avian COI records deposited in GenBank over the past decade. This approach has potential application for genetic database quality control, discovery of cryptic pseudogenes, and studies of low-level genetic variation.