TREE2FASTA: a flexible Perl script for batch extraction of FASTA sequences from exploratory phylogenetic trees - PubMed (original) (raw)
TREE2FASTA: a flexible Perl script for batch extraction of FASTA sequences from exploratory phylogenetic trees
Thomas Sauvage et al. BMC Res Notes. 2018.
Abstract
Objective: The body of DNA sequence data lacking taxonomically informative sequence headers is rapidly growing in user and public databases (e.g. sequences lacking identification and contaminants). In the context of systematics studies, sorting such sequence data for taxonomic curation and/or molecular diversity characterization (e.g. crypticism) often requires the building of exploratory phylogenetic trees with reference taxa. The subsequent step of segregating DNA sequences of interest based on observed topological relationships can represent a challenging task, especially for large datasets.
Results: We have written TREE2FASTA, a Perl script that enables and expedites the sorting of FASTA-formatted sequence data from exploratory phylogenetic trees. TREE2FASTA takes advantage of the interactive, rapid point-and-click color selection and/or annotations of tree leaves in the popular Java tree-viewer FigTree to segregate groups of FASTA sequences of interest to separate files. TREE2FASTA allows for both simple and nested segregation designs to facilitate the simultaneous preparation of multiple data sets that may overlap in sequence content.
Keywords: Barcoding; Biodiversity; Clone; Contaminant; Cryptic; Environmental; FigTree; Forensic; Metabarcoding; OTU; Phylogeny; Systematics.
Figures
Fig. 1
Simulated phylogeny displaying taxa named ‘A’ to ‘T’. a Basic workflow for FASTA sequence extraction with TREE2FASTA. An exploratory tree is built following multiple-alignment of FASTA data. The Newick tree string (NWK) is visualized and edited in the tree-viewer FigTree and saved as a NEXUS file (NEX). TREE2FASTA uses the FASTA alignment and the NEXUS file (NEX) to produce subsetted FASTA files according to user selection scheme (here color). b Example of possible color and/or annotation selection schemes in FigTree for TREE2FASTA sequence extraction. The FASTA icon marked with an asterisk ‘*’ contains FASTA sequences for taxa H and I lacking color selection (i.e. achromatic) or lacking annotation. For figure clarity annotation ‘Group1’ to ‘Group4’ are reported G1 to G4 within FASTA file icons. FASTA files output to different folders are delimited by dashed boxes
Fig. 2
Sorting Genbank 16S rDNA for red seaweeds with TREE2FASTA. a Successive edits done in FigTree to establish an annotated design nested by color for reference and environmental red seaweeds (Florideophytes). b Folders and subsetted FASTA files output by TREE2FASTA for downstream analyses (folder content separated by dashed lines). For figure clarity, the Florideophyceae annotation was abbreviated to ‘Flo’ within FASTA file icons. The tree was produced with the 500 closest matches to Taenioma perpusillum (MF101452) on Genbank®
Similar articles
- TaxMan: a taxonomic database manager.
Jones M, Blaxter M. Jones M, et al. BMC Bioinformatics. 2006 Dec 18;7:536. doi: 10.1186/1471-2105-7-536. BMC Bioinformatics. 2006. PMID: 17176465 Free PMC article. - On the quality of tree-based protein classification.
Lazareva-Ulitsky B, Diemer K, Thomas PD. Lazareva-Ulitsky B, et al. Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12. Bioinformatics. 2005. PMID: 15647305 - Automated DNA-based plant identification for large-scale biodiversity assessment.
Papadopoulou A, Chesters D, Coronado I, De la Cadena G, Cardoso A, Reyes JC, Maes JM, Rueda RM, Gómez-Zurita J. Papadopoulou A, et al. Mol Ecol Resour. 2015 Jan;15(1):136-52. doi: 10.1111/1755-0998.12256. Epub 2014 Apr 12. Mol Ecol Resour. 2015. PMID: 24666885 - 25 years of serving the community with ribosomal RNA gene reference databases and tools.
Glöckner FO, Yilmaz P, Quast C, Gerken J, Beccati A, Ciuprina A, Bruns G, Yarza P, Peplies J, Westram R, Ludwig W. Glöckner FO, et al. J Biotechnol. 2017 Nov 10;261:169-176. doi: 10.1016/j.jbiotec.2017.06.1198. Epub 2017 Jun 23. J Biotechnol. 2017. PMID: 28648396 Review. - IVisTMSA: Interactive Visual Tools for Multiple Sequence Alignments.
Pervez MT, Babar ME, Nadeem A, Aslam N, Naveed N, Ahmad S, Muhammad S, Qadri S, Shahid M, Hussain T, Javed M. Pervez MT, et al. Evol Bioinform Online. 2015 Mar 12;11:35-42. doi: 10.4137/EBO.S18980. eCollection 2015. Evol Bioinform Online. 2015. PMID: 25861209 Free PMC article. Review.
Cited by
- Whole transcriptome analysis and construction of a ceRNA regulatory network related to leaf and petiole development in Chinese cabbage (Brassica campestris L. ssp. pekinensis).
Shi F, Zhao Z, Jiang Y, Liu S, Tan C, Liu C, Ye X, Liu Z. Shi F, et al. BMC Genomics. 2023 Mar 24;24(1):144. doi: 10.1186/s12864-023-09239-y. BMC Genomics. 2023. PMID: 36964498 Free PMC article. - Complete functional analysis of type IV pilus components of a reemergent plant pathogen reveals neofunctionalization of paralog genes.
Merfa MV, Zhu X, Shantharaj D, Gomez LM, Naranjo E, Potnis N, Cobine PA, De La Fuente L. Merfa MV, et al. PLoS Pathog. 2023 Feb 13;19(2):e1011154. doi: 10.1371/journal.ppat.1011154. eCollection 2023 Feb. PLoS Pathog. 2023. PMID: 36780566 Free PMC article. - Phylogenomic assessment of drug-resistant Mycobacterium tuberculosis strains from Beira, Mozambique.
Namburete EI, Dippenaar A, Conceição EC, Feliciano C, Nascimento MMPD, Peronni KC, Silva WA Jr, Ferro JJ, Harrison LH, Warren RM, Bollela VR. Namburete EI, et al. Tuberculosis (Edinb). 2020 Mar;121:101905. doi: 10.1016/j.tube.2020.101905. Epub 2020 Jan 29. Tuberculosis (Edinb). 2020. PMID: 32063558 Free PMC article. - Isolation and Characterization of Anaephenes A-C, Alkylphenols from a Filamentous Cyanobacterium ( Hormoscilla sp., Oscillatoriales).
Brumley D, Spencer KA, Gunasekera SP, Sauvage T, Biggs J, Paul VJ, Luesch H. Brumley D, et al. J Nat Prod. 2018 Dec 28;81(12):2716-2721. doi: 10.1021/acs.jnatprod.8b00650. Epub 2018 Nov 29. J Nat Prod. 2018. PMID: 30489078 Free PMC article.
References
- Wheeler Q. The new taxonomy. Systematics association special. Boca Raton: CRC Press; 2008.
- Felsenstein’s J. Newick format definition webpage. http://evolution.genetics.washington.edu/phylip/newicktree.html. Accessed 1 Feb 2018
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources