Sequencing, analysis, and annotation of expressed sequence tags for Camelus dromedarius - PubMed (original) (raw)
. 2010 May 19;5(5):e10720.
doi: 10.1371/journal.pone.0010720.
Maher M Shehata, Faisel M Abu-Duhier, Essam J Al-Yamani, Khalid A Al-Busadah, Mohammed S Al-Arawi, Ali Y Al-Khider, Abdullah N Al-Muhaimeed, Fahad H Al-Qahtani, Manee M Manee, Badr M Al-Shomrani, Saad M Al-Qhtani, Amer S Al-Harthi, Kadir C Akdemir, Mehmet S Inan, Hasan H Otu
Affiliations
- PMID: 20502665
- PMCID: PMC2873428
- DOI: 10.1371/journal.pone.0010720
Sequencing, analysis, and annotation of expressed sequence tags for Camelus dromedarius
Abdulaziz M Al-Swailem et al. PLoS One. 2010.
Abstract
Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and approximately 40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism.
Conflict of interest statement
Competing Interests: The authors have declared that no competing interests exist.
Figures
Figure 1. Analysis Workflow:
Outlay of analysis steps performed for Camel EST data. External programs used for analysis are shown where appropriate.
Figure 2. Read Length and Base Call Quality Distribution:
Distribution of read length (a) and high quality base pairs per read (b) in trimmed and untrimmed EST data. Average ± standard deviation values for the measured parameters are overlaid on the graphs.
Figure 3. Sample Cluster:
A sample instance of a cluster showing thirty high quality reads masked for repeats that are grouped and aligned to form a final consensus sequence yielding a contig. Individual reads are shown as blue bars and the consensus sequence is shown at top as a red bar. Labels to the left of bars show sequence IDs used for internal analysis purposes. Base pair scale is shown above the consensus sequence with 40 bp intervals rendering a consensus sequence slightly above 1,160 bp. Reads render overlaps of at least 98% identity, at least 40bp, and at most 20bp overlap distance of sequence end.
Figure 4. Sequence Length and ORF Length Distribution:
Sequence length distribution for contigs and singletons (a), distribution of longest ORF lengths found in contigs and singletons (b), sequence length distribution for contigs and singletons with hits and no hits (c), and distribution of longest ORF lengths found in contigs and singletons with hits and no hits (d). Average ± standard dev. values of sequence and ORF lengths are overlaid on corresponding graphs. Sequence lengths up to 2,500 and ORF lengths up to 1,800 bp are shown for display purposes. 2.3% of contig and no singleton sequences have length longer than 2,500 bp (a), 2% of contig and no singleton sequences have an ORF longer than 1,800 bp (b), 1.8% of contigs with a hit and no other sequences in the remaining three groups have length longer than 2,500 bp (c), and 1.6% of contigs with a hit and no other sequences in the remaining three groups have an ORF longer than 1,800 bp (d).
Figure 5. GO Categories:
Top thirty Biological Process GO category terms found most abundant among the Homo sapiens genes similar to camel sequences.
Figure 6. Shared Genes:
Comparison of genes found (and then matched in HomoloGene) in human, mouse, rat, and bovine. Regions not shown in the Venn diagram are genes shared by human and bovine only (403 genes) and mouse and rat only (536 genes).
Figure 7. Gene Interaction Network:
Most significant network identified by IPA using 8,405 genes found in camel ESTs shared by human, mouse, rat, and bovine. Molecules involved in two most highly associated functions in the network (“hair and skin development and function” and “renal and urological system development” are shown in light green with related functional annotation.
Figure 8. Canonical Pathway:
Most significantly associated pathway (NRF-2 mediated oxidative stress response pathway) by the data set of 8,405 genes found in camel ESTs shared by human, mouse, rat, and bovine using IPA. Molecules that exist in the data set are shown in red.
Similar articles
- Characterization of 954 bovine full-CDS cDNA sequences.
Harhay GP, Sonstegard TS, Keele JW, Heaton MP, Clawson ML, Snelling WM, Wiedmann RT, Van Tassell CP, Smith TP. Harhay GP, et al. BMC Genomics. 2005 Nov 23;6:166. doi: 10.1186/1471-2164-6-166. BMC Genomics. 2005. PMID: 16305752 Free PMC article. - Characterization of open reading frame-expressed sequence tags generated from Bos indicus and B. taurus mammary gland cDNA libraries.
da Mota AF, Sonstegard TS, Van Tassell CP, Shade LL, Matukumalli LK, Wood DL, Capuco AV, Brito MA, Connor EE, Martinez ML, Coutinho LL. da Mota AF, et al. Anim Genet. 2004 Jun;35(3):213-9. doi: 10.1111/j.1365-2052.2004.01139.x. Anim Genet. 2004. PMID: 15147393 - Molecular cloning, characterization and predicted structure of a putative copper-zinc SOD from the camel, Camelus dromedarius.
Ataya FS, Fouad D, Al-Olayan E, Malik A. Ataya FS, et al. Int J Mol Sci. 2012;13(1):879-900. doi: 10.3390/ijms13010879. Epub 2012 Jan 16. Int J Mol Sci. 2012. PMID: 22312292 Free PMC article.
Cited by
- Scanning electron microscopy and morphometric analysis of the hair in dromedaries with SEM-EDX in relation to age.
Alsafy MAM, El-Gendy SAA, Derbalah A, Rashwan AM, Haddad SS. Alsafy MAM, et al. BMC Zool. 2024 Jul 15;9(1):17. doi: 10.1186/s40850-024-00204-0. BMC Zool. 2024. PMID: 39010185 Free PMC article. - Exploiting morphobiometric and genomic variability of African indigenous camel populations-A review.
Yakubu A, Okpeku M, Shoyombo AJ, Onasanya GO, Dahloum L, Çelik S, Oladepo A. Yakubu A, et al. Front Genet. 2022 Dec 12;13:1021685. doi: 10.3389/fgene.2022.1021685. eCollection 2022. Front Genet. 2022. PMID: 36579332 Free PMC article. - Comparative analysis of transposable elements provides insights into genome evolution in the genus Camelus.
Ibrahim MA, Al-Shomrani BM, Simenc M, Alharbi SN, Alqahtani FH, Al-Fageeh MB, Manee MM. Ibrahim MA, et al. BMC Genomics. 2021 Nov 20;22(1):842. doi: 10.1186/s12864-021-08117-9. BMC Genomics. 2021. PMID: 34800971 Free PMC article. - Multiomic analysis of the Arabian camel (Camelus dromedarius) kidney reveals a role for cholesterol in water conservation.
Alvira-Iraizoz F, Gillard BT, Lin P, Paterson A, Pauža AG, Ali MA, Alabsi AH, Burger PA, Hamadi N, Adem A, Murphy D, Greenwood MP. Alvira-Iraizoz F, et al. Commun Biol. 2021 Jun 23;4(1):779. doi: 10.1038/s42003-021-02327-3. Commun Biol. 2021. PMID: 34163009 Free PMC article. - Comparative analysis of camelid mitochondrial genomes.
Manee MM, Alshehri MA, Binghadir SA, Aldhafer SH, Alswailem RM, Algarni AT, Al-Shomrani BM, Al-Fageeh MB. Manee MM, et al. J Genet. 2019 Sep;98:88. J Genet. 2019. PMID: 31544791
References
- Al-Swailem AM, Al-Busadah KA, Shehata MM, Al-Anazi IO, Askari E. Classification of Saudi Arabian camel (Camelus dromedarius) subtypes based on RAPD technique. Journal of Food, Agriculture & Environment. 2007;5:143–148.
- Emmanuel B, Nahapetian A. Fatty acid composition of depot fats and rumen wall of the camel (Camelus dromedarius). Comparative Biochemistry Physiology Part B: Biochemistry and Molecular Biology. 1980;67:701–704.
- Duncan WRH, Garton GA. The fatty acid composition and intermolecular structure of triglycerides derived from different sites in the body of the sheep. Journal of the Science of Food and Agriculture. 1967;18:99–102. - PubMed
- Al-Ani FK. Camel management and diseases. Amman, Jordan: Al-Shraq Printing Press & Dar Ammar Book Publishing; 2004.
- Muyldermans S. Single domain camel antibodies: current status. J Biotechnol. 2001;74:277–302. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Research Materials