Identification of gene-oriented exon orthology between human and mouse - PubMed (original) (raw)

Identification of gene-oriented exon orthology between human and mouse

Gloria C-L Fu et al. BMC Genomics. 2012.

Abstract

Background: Gene orthology has been well studied in the evolutionary area and is thought to be an important implication to functional genome annotations. As the accumulation of transcriptomic data, alternative splicing is taken into account in the assignments of gene orthologs and the orthology is suggested to be further considered at transcript level. Whether gene or transcript orthology, exons are the basic units that represent the whole gene structure; however, there is no any reported study on how to build exon level orthology in a whole genome scale. Therefore, it is essential to establish a gene-oriented exon orthology dataset.

Results: Using a customized pipeline, we first build exon orthologous relationships from assigned gene orthologs pairs in two well-annotated genomes: human and mouse. More than 92% of non-overlapping exons have at least one ortholog between human and mouse and only a small portion of them own more than one ortholog. The exons located in the coding region are more conserved in terms of finding their ortholog counterparts. Within the untranslated region, the 5' UTR seems to have more diversity than the 3' UTR according to exon orthology designations. Interestingly, most exons located in the coding region are also conserved in length but this conservation phenomenon dramatically drops down in untranslated regions. In addition, we allowed multiple assignments in exon orthologs and a subset of exons with possible fusion/split events were defined here after a thorough analysis procedure.

Conclusions: Identification of orthologs at the exon level is essential to provide a detailed way to interrogate gene orthology and splicing analysis. It could be used to extend the genome annotation as well. Besides examining the one-to-one orthologous relationship, we manage the one-to-multi exon pairs to represent complicated exon generation behavior. Our results can be further applied in many research fields studying intron-exon structure and alternative/constitutive exons in functional genomic areas.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Process of generating human and mouse exon orthologs. A. Flow chart of orthologous exon database building. B. Exons with N (N > 1) putative exon orthologs were further verified if these N exons belonged to the same united exon. C. Identification of fused and split exons through analysis procedure from putative orthologous exons. D. Confirmation of the correct orthologous exon pairs via anchor mapping for the rest of the putative orthologs yet to be verified.

Figure 2

Figure 2

Distribution of united exons with and without orthologous pairs in different gene regions. The x-axis represents the gene regions in which united exons are located, and the y-axis is the percentage of united exons with or without orthologs to the total number of united exons in each category. SLR means a single long united exon extending from 5' UTR to 3' UTR.

Figure 3

Figure 3

Comparisons of united exons with or without length conservation to their orthologous pairs from human and mouse. Only one-to-one orthologous pairs are analyzed in this figure. The x-axis indicates the gene regions in which the united exons are located. The SLR tag means that the united exon crossed through 5' UTR, the coding region, and 3' UTR. The y-axis represents the number of united exons in two categories (with equal or unequal length to orthologous pairs) normalized to the total number of united exons in each gene region.

Figure 4

Figure 4

Analysis of length differences between orthologous pairs of unequal exon length. A. Distribution of remainders of three in length differences between orthologous exon pairs with unequal length. The MOD 3 = 0 category means that the remainder is zero when the length difference between exon orthologs is divided by three, and so on. The x-axis is the region in which the united exon is located, and the y-axis indicates the percentage of orthologous pairs with unequal length in each region for each category. B. Distribution in counts of united exons where length difference is less than or equal to 30 base pairs compared to orthologous exon pairs.

Similar articles

Cited by

References

    1. Koonin EV. Orthologs, paralogs, and evolutionary genomics. Annu Rev Genet. 2005;39:309–338. doi: 10.1146/annurev.genet.39.073003.114725. - DOI - PubMed
    1. Korf I, Flicek P, Duan D, Brent MR. Integrating genomic homology into gene structure prediction. Bioinformatics. 2001;17(Suppl 1):S140–S148. doi: 10.1093/bioinformatics/17.suppl_1.S140. - DOI - PubMed
    1. Meyer IM, Durbin R. Comparative ab initio prediction of gene structures using pair HMMs. Bioinformatics. 2002;18(10):1309–1318. doi: 10.1093/bioinformatics/18.10.1309. - DOI - PubMed
    1. Bafna V, Huson DH. The conserved exon method for gene finding. Proc Int Conf Intell Syst Mol Biol. 2000;8:3–12. - PubMed
    1. Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, Chetvernin V, Church DM, Dicuccio M, Federhen S. et al.Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2010;38(Database issue):D5–D16. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources