CpGDB : A Comprehensive Database of Chloroplast Genomes (original) (raw)

Current trends in chloroplast genome research

2010

Chloroplast is an important cellular organelle of autotrophs which has an independent, circular, doublestranded DNA molecule termed as chloroplast genome. The chloroplast DNA (cpDNA) contains essential genes for its maintenance and operation. Several components of the photosystems and proteins involved in biosynthetic pathways are also encoded by the chloroplast genome. Exploring the genetic repository of this organelle is vital due to its conserved nature, small size, persistent gene organization and promising ability for transgenic expression. Therefore, cpDNA sequence information has been instrumental in phylogenetic studies and molecular taxonomy of plants. Chloroplast genome sequencing efforts have being initiated with conventional cloning and chain-termination sequencing technologies. Dedicated databases such as CGDB and GOBASE among others have been established as more and more complete cpDNA sequences are being reported. Presently, elegant molecular biology techniques including shotgun sequencing, rolling circle amplification (RCA), Amplification, Sequencing and Annotation of Plasteome (ASAP) and Next generation sequencing are being used to accelerate data output. Owing to many fold increase in submission of cpDNA sequences in nucleotide databases, challenges of in-depth data analysis stimulated the emergence of devoted annotation, assembling and phylogenetic software. Recently, reported bioinformatics software for chloroplast genome studies comprise of DOGMA for annotation, SCAN-SE, ARAGON and PREP suit for RNA analyses and CG viewer for circular map construction/comparative analysis. Faster algorithms for gene-order based phylogenetic reconstruction and bootstrap analysis have attracted the attention of research community. Current trends in sequencing strategies and bioinformatics with reference to chloroplast genomes hold great potential to illuminate more hidden corners of this ancient cell organelle.

Chloroplast genomes: diversity, evolution, and applications in genetic engineering

Chloroplasts play a crucial role in sustaining life on earth. The availability of over 800 sequenced chloroplast genomes from a variety of land plants has enhanced our understanding of chloroplast biology, intracellular gene transfer, conservation, diversity, and the genetic basis by which chloroplast transgenes can be engineered to enhance plant agronomic traits or to produce high-value agricultural or biomedical products. In this review, we discuss the impact of chloroplast genome sequences on understanding the origins of economically important cultivated species and changes that have taken place during domestication. We also discuss the potential biotechnological applications of chloroplast genomes.

Distribution and Nomenclature of Protein-coding Genes in 12 Sequenced Chloroplast Genomes

1998

Abbreviations: bind., binding; Chl, Chlorella vulgaris ;C pa,Cyanophora paradoxa ;E pi, Epifagus virginiana; Eug, Euglena gracilis; hom., homologue; Mar, Marchantia polymorpha ; Nic, Nicotiana tabacum; Odo, Odontella sinensis ;O ry,Oryza sativa ;P in,Pinus thunbergii; Pla, Plasmodium falciparum ;P or,Porphyra purpurea; prot., protein; RT, reverse transcriptase; sim., similar to; SU, subunit; Syn, Synechocystis sp. PCC6803; Zea, Zea mays

Comparative assessment shows the reliability of chloroplast genome assembly using RNA-seq

Scientific reports, 2018

Chloroplast genomes (cp genomes) are widely used in comparative genomics, population genetics, and phylogenetic studies. Obtaining chloroplast genomes from RNA-Seq data seems feasible due to the almost full transcription of cpDNA. However, the reliability of chloroplast genomes assembled from RNA-Seq instead of genomic DNA libraries remains to be thoroughly verified. In this study, we assembled chloroplast genomes for three Erysimum (Brassicaceae) species from three RNA-Seq replicas and from one genomic library of each species, using a streamlined bioinformatics protocol. We compared these assembled genomes, confirming that assembled cp genomes from RNA-Seq data were highly similar to each other and to those from genomic libraries in terms of overall structure, size, and composition. Although post-transcriptional modifications, such as RNA-editing, may introduce variations in the RNA-seq data, the assembly of cp genomes from RNA-seq appeared to be reliable. Moreover, RNA-Seq assembl...

Towards the Well-Tempered Chloroplast DNA Sequences

2021

With the development of next-generation sequencing technology and bioinformatics tools, the process of assembling DNA sequences has become cheaper and easier, especially in the case of much shorter organelle genomes. The number of available DNA sequences of complete chloroplast genomes in public genetic databases is constantly increasing and the data are widely used in plant phylogenetic and biotechnological research. In this work, we investigated possible inconsistencies in the stored form of publicly available chloroplast genome sequence data. The impact of these inconsistencies on the results of the phylogenetic analysis was investigated and the bioinformatic solution to identify and correct inconsistencies was implemented. The whole procedure was demonstrated using five plant families (Apiaceae, Asteraceae, Campanulaceae, Lamiaceae and Rosaceae) as examples.

1Finding the Core-Genes of Chloroplasts

2016

Abstract—Due to the recent evolution of sequencing techniques, the number of available genomes is rising steadily, leading to the possibility to make large scale genomic comparison between sets of close species. An interesting question to answer is: what is the common functionality genes of a collection of species, or conversely, to determine what is specific to a given species when compared to other ones belonging in the same genus, family, etc. Investigating such problem means to find both core and pan genomes of a collection of species, i.e., genes in common to all the species vs. the set of all genes in all species under consideration. However, obtaining trustworthy core and pan genomes is not an easy task, leading to a large amount of computation, and requiring a rigorous methodology. Surprisingly, as far as we know, this methodology in finding core and pan genomes has not really been deeply investigated. This research work tries to fill this gap by focusing only on chloroplast...

Whole genome sequencing of enriched chloroplast DNA using the Illumina GAII platform

Background Complete chloroplast genome sequences provide a valuable source of molecular markers for studies in molecular ecology and evolution of plants. To obtain complete genome sequences, recent studies have made use of the polymerase chain reaction to amplify overlapping fragments from conserved gene loci. However, this approach is time consuming and can be more difficult to implement where gene organisation differs among plants. An alternative approach is to first isolate chloroplasts and then use the capacity of high-throughput sequencing to obtain complete genome sequences. We report our findings from studies of the latter approach, which used a simple chloroplast isolation procedure, multiply-primed rolling circle amplification of chloroplast DNA, Illumina Genome Analyzer II sequencing, and de novo assembly of paired-end sequence reads. Results A modified rapid chloroplast isolation protocol was used to obtain plant DNA that was enriched for chloroplast DNA, but nevertheless contained nuclear and mitochondrial DNA. Multiply-primed rolling circle amplification of this mixed template produced sufficient quantities of chloroplast DNA, even when the amount of starting material was small, and improved the template quality for Illumina Genome Analyzer II (hereafter Illumina GAII) sequencing. We demonstrate, using independent samples of karaka (Corynocarpus laevigatus), that there is high fidelity in the sequence obtained from this template. Although less than 20% of our sequenced reads could be mapped to chloroplast genome, it was relatively easy to assemble complete chloroplast genome sequences from the mixture of nuclear, mitochondrial and chloroplast reads. Conclusions We report successful whole genome sequencing of chloroplast DNA from karaka, obtained efficiently and with high fidelity.

Origin and Phylogeny of Chloroplasts Revealed by a Simple Correlation Analysis of Complete Genomes

Molecular Biology and Evolution, 2003

The complete sequenced genomes of chloroplast have provided much information on the origin and evolution of this organelle. In this paper we attempt to use these sequences to test a novel approach for phylogenetic analysis of complete genomes based on correlation analysis of compositional vectors. All protein sequences from 21 complete chloroplast genomes are analyzed in comparison with selected archaea, eubacteria, and eukaryotes. The distance-based analysis shows that the chloroplast genomes are most closely related to cyanobacteria, consistent with the endosymbiotic origin of chloroplasts. The chloroplast genomes are separated to two major clades corresponding to chlorophytes (green plants) s.l. and rhodophytes (red algae) s.l. The interrelationships among the chloroplasts are largely in agreement with the current understanding on chloroplast evolution. For instance, the analysis places the chloroplasts of two chromophytes (Guillardia and Odontella) within the rhodophyte lineage, supporting secondary endosymbiosis as the source of these chloroplasts. The relationships among the green algae and land plants in our tree also agree with results from traditional phylogenetic analyses. Thus, this study establishes the value of our simple correlation analysis in elucidating the evolutionary relationships among genomes. It is hoped that this approach will provide insights on comparative genome analysis. Materials and Methods Genome Data Sets Complete sequences of 21 chloroplast genomes

The chloroplast genome hidden in plain sight, open access publishing and anti-fragile distributed data sources

Mitochondrial DNA, 2015

We sequenced several cannabis genomes in 2011 of June and the first and the longest contigs to emerge were the chloroplast and mitochondrial genomes. Having been a contributor to the Human Genome Project and an eye-witness to the real benefits of immediate data release, I have first hand experience with the potential mal-investment of millions of dollars of tax payer money narrowly averted due to the adopted global rapid data release policy. The policy was vital in reducing duplication of effort and economic waste. As a result, we felt obligated to publish the Cannabis genome data in a similar spirit and placed them immediately on a cloud based Amazon server in August of 2011. While these rapid data release practices were heralded by many in the media, we still find some authors fail to find or reference said work and hope to compel the readership that this omission has more pervasive repercussions than bruised egos and is a regression for our community.