Sequencing, analysis, and annotation of expressed sequence tags for Camelus dromedarius - PubMed (original) (raw)

. 2010 May 19;5(5):e10720.

doi: 10.1371/journal.pone.0010720.

Maher M Shehata, Faisel M Abu-Duhier, Essam J Al-Yamani, Khalid A Al-Busadah, Mohammed S Al-Arawi, Ali Y Al-Khider, Abdullah N Al-Muhaimeed, Fahad H Al-Qahtani, Manee M Manee, Badr M Al-Shomrani, Saad M Al-Qhtani, Amer S Al-Harthi, Kadir C Akdemir, Mehmet S Inan, Hasan H Otu

Affiliations

Sequencing, analysis, and annotation of expressed sequence tags for Camelus dromedarius

Abdulaziz M Al-Swailem et al. PLoS One. 2010.

Abstract

Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and approximately 40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1

Figure 1. Analysis Workflow:

Outlay of analysis steps performed for Camel EST data. External programs used for analysis are shown where appropriate.

Figure 2

Figure 2. Read Length and Base Call Quality Distribution:

Distribution of read length (a) and high quality base pairs per read (b) in trimmed and untrimmed EST data. Average ± standard deviation values for the measured parameters are overlaid on the graphs.

Figure 3

Figure 3. Sample Cluster:

A sample instance of a cluster showing thirty high quality reads masked for repeats that are grouped and aligned to form a final consensus sequence yielding a contig. Individual reads are shown as blue bars and the consensus sequence is shown at top as a red bar. Labels to the left of bars show sequence IDs used for internal analysis purposes. Base pair scale is shown above the consensus sequence with 40 bp intervals rendering a consensus sequence slightly above 1,160 bp. Reads render overlaps of at least 98% identity, at least 40bp, and at most 20bp overlap distance of sequence end.

Figure 4

Figure 4. Sequence Length and ORF Length Distribution:

Sequence length distribution for contigs and singletons (a), distribution of longest ORF lengths found in contigs and singletons (b), sequence length distribution for contigs and singletons with hits and no hits (c), and distribution of longest ORF lengths found in contigs and singletons with hits and no hits (d). Average ± standard dev. values of sequence and ORF lengths are overlaid on corresponding graphs. Sequence lengths up to 2,500 and ORF lengths up to 1,800 bp are shown for display purposes. 2.3% of contig and no singleton sequences have length longer than 2,500 bp (a), 2% of contig and no singleton sequences have an ORF longer than 1,800 bp (b), 1.8% of contigs with a hit and no other sequences in the remaining three groups have length longer than 2,500 bp (c), and 1.6% of contigs with a hit and no other sequences in the remaining three groups have an ORF longer than 1,800 bp (d).

Figure 5

Figure 5. GO Categories:

Top thirty Biological Process GO category terms found most abundant among the Homo sapiens genes similar to camel sequences.

Figure 6

Figure 6. Shared Genes:

Comparison of genes found (and then matched in HomoloGene) in human, mouse, rat, and bovine. Regions not shown in the Venn diagram are genes shared by human and bovine only (403 genes) and mouse and rat only (536 genes).

Figure 7

Figure 7. Gene Interaction Network:

Most significant network identified by IPA using 8,405 genes found in camel ESTs shared by human, mouse, rat, and bovine. Molecules involved in two most highly associated functions in the network (“hair and skin development and function” and “renal and urological system development” are shown in light green with related functional annotation.

Figure 8

Figure 8. Canonical Pathway:

Most significantly associated pathway (NRF-2 mediated oxidative stress response pathway) by the data set of 8,405 genes found in camel ESTs shared by human, mouse, rat, and bovine using IPA. Molecules that exist in the data set are shown in red.

Similar articles

Cited by

References

    1. Al-Swailem AM, Al-Busadah KA, Shehata MM, Al-Anazi IO, Askari E. Classification of Saudi Arabian camel (Camelus dromedarius) subtypes based on RAPD technique. Journal of Food, Agriculture & Environment. 2007;5:143–148.
    1. Emmanuel B, Nahapetian A. Fatty acid composition of depot fats and rumen wall of the camel (Camelus dromedarius). Comparative Biochemistry Physiology Part B: Biochemistry and Molecular Biology. 1980;67:701–704.
    1. Duncan WRH, Garton GA. The fatty acid composition and intermolecular structure of triglycerides derived from different sites in the body of the sheep. Journal of the Science of Food and Agriculture. 1967;18:99–102. - PubMed
    1. Al-Ani FK. Camel management and diseases. Amman, Jordan: Al-Shraq Printing Press & Dar Ammar Book Publishing; 2004.
    1. Muyldermans S. Single domain camel antibodies: current status. J Biotechnol. 2001;74:277–302. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources