Xenome--a tool for classifying reads from xenograft samples - PubMed (original) (raw)

Xenome--a tool for classifying reads from xenograft samples

Thomas Conway et al. Bioinformatics. 2012.

Abstract

Motivation: Shotgun sequence read data derived from xenograft material contains a mixture of reads arising from the host and reads arising from the graft. Classifying the read mixture to separate the two allows for more precise analysis to be performed.

Results: We present a technique, with an associated tool Xenome, which performs fast, accurate and specific classification of xenograft-derived sequence read data. We have evaluated it on RNA-Seq data from human, mouse and human-in-mouse xenograft datasets.

Availability: Xenome is available for non-commercial use from http://www.nicta.com.au/bioinformatics.

PubMed Disclaimer

Figures

Fig. 1.

Fig. 1.

A Venn diagram showing the different classes that a given _k_-mer may belong to. The marginal host (and marginal graft) partitions are for those host (and graft) _k_-mers that are Hamming distance 1 from a _k_-mer in the graft (and host) reference

Fig. 2.

Fig. 2.

Summary of the results with Human cDNA. Each of the classes of reads is divided into those reads assigned to the class only by Xenome (Xenome), only by the Tophat analysis (Tophat) or by both Xenome and the Tophat analysis (Concordant)

Fig. 3.

Fig. 3.

Summary of the results with Murine cDNA

Fig. 4.

Fig. 4.

Summary of the results with BM18 xenograft cDNA

Fig. 5.

Fig. 5.

Validation of the in silico classification of xenograft RNA-Seq data with qRT-PCR. The horizontal axis shows log10_FPKM_ for the _Xenome_-derived gene expression for the 18 test genes. The vertical axis shows the _C_t value for each gene relative to the C_t of actin. There were two RNA-Seq samples processed (biological replicates), and four replicates of the qRT-PCR. For each gene, an ellipse is shown centered on the mean log10_FPKM in the _x_-axis, and on the mean relative _C_t in the _y_-axis. The horizontal and vertical radii show the variance in the samples

Fig. 6.

Fig. 6.

An in silico analysis showing the degree of ambiguity in HG19 refGene, according to the _k_-mer based analysis used by Xenome. In this analysis, k = 25

Fig. 7.

Fig. 7.

A plot showing the distribution of human genes with respect to the proportion of xenograft reads which are classed as both by the _Tophat_-based analysis and the Xenome analysis. The reads considered are only those mapped by Tophat since Xenome does not yield mappings, so cannot be used to assign reads to genes. Only genes for which at least 20 reads mapped were considered. The horizontal axis corresponds to the number of reads classified as both or ambiguous by Xenome as a proportion of all the reads that might possibly be human (i.e. both, ambiguous or human). The vertical axis corresponds to the number of reads classified as both by the _Tophat_-based analysis, once again, as a proportion of all the reads that might possibly be human

Similar articles

Cited by

References

    1. Arbitman Y., et al. 2010 IEEE 51st Annual Symposium on Foundations of Computer Science. Los Alamos California: IEEE Computer Society; 2010. Backyard cuckoo hashing: Constant worst-case operations with a succinct representation; pp. 787–796.
    1. Chung D., et al. Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data. PLoS Comput. Biol. 2011;7:e1002111. - PMC - PubMed
    1. Conway T.C., Bromage A.J. Succinct data structures for assembling large genomes. Bioinformatics. 2011;27:479–486. - PubMed
    1. Ding L., et al. Genome remodelling in a basal-like breast cancer metastasis and xenograft. Nature. 2010;464:999–1005. - PMC - PubMed
    1. Hormozdiari F., et al. Next-generation VariationHunter: combinatorial algorithms for transposon insertion discovery. Bioinformatics. 2010;26:i350–i357. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources