MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences (original) (raw)

Journal Article

,

Department of Plant Biology, University of Georgia, Athens, GA 30602, USA

Search for other works by this author on:

Department of Plant Biology, University of Georgia, Athens, GA 30602, USA

Search for other works by this author on:

Revision received:

08 September 2010

Accepted:

13 September 2010

Published:

29 September 2010

Cite

Yujun Han, Susan R. Wessler, MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences, Nucleic Acids Research, Volume 38, Issue 22, 1 December 2010, Page e199, https://doi.org/10.1093/nar/gkq862
Close

Navbar Search Filter Mobile Enter search term Search

Abstract

Miniature inverted-repeat transposable elements (MITEs) are a special type of Class 2 non-autonomous transposable element (TE) that are abundant in the non-coding regions of the genes of many plant and animal species. The accurate identification of MITEs has been a challenge for existing programs because they lack coding sequences and, as such, evolve very rapidly. Because of their importance to gene and genome evolution, we developed MITE-Hunter, a program pipeline that can identify MITEs as well as other small Class 2 non-autonomous TEs from genomic DNA data sets. The output of MITE-Hunter is composed of consensus TE sequences grouped into families that can be used as a library file for homology-based TE detection programs such as RepeatMasker. MITE-Hunter was evaluated by searching the rice genomic database and comparing the output with known rice TEs. It discovered most of the previously reported rice MITEs (97.6%), and found sixteen new elements. MITE-Hunter was also compared with two other MITE discovery programs, FINDMITE and MUST. Unlike MITE-Hunter, neither of these programs can search large genomic data sets including whole genome sequences. More importantly, MITE-Hunter is significantly more accurate than either FINDMITE or MUST as the vast majority of their outputs are false-positives.

INTRODUCTION

Transposable elements (TEs) reside in all characterized eukaryotic genomes where they are often the largest component. For example, sequences derived from TEs make up at least 31% of the genome of dog (Canis familiaris), 38% of mouse (Mus musculus), 46% of human (Homo sapiens) and 85% of maize (Zea mays ssp. mays L.) (1–4). TEs have structural features and classification systems that serve to distinguish them from simpler repetitive sequences like microsatellite repeats. TEs are divided into two classes based on the molecule involved in transposition: retrotransposons (Class 1) move via a RNA intermediate while DNA is the intermediate of DNA transposons (Class 2). In each class, TEs are further divided into superfamilies and families (5). In plants, six Class 2 superfamilies have been identified thus far: Tc1/Mariner, PIF/Harbinger, hAT, MULE, CACTA and Helitron (5,6). With the exception of Helitrons, TEs in the other five superfamilies have terminal inverted repeats (TIRs) and transpose through a cut-and-paste mechanism. TEs are also classified as autonomous or non-autonomous elements based on whether they can produce functional transposase.

Miniature inverted-repeat TEs (MITEs) are a special type of Class 2 non-autonomous element that is present in high copy numbers in many eukaryotic genomes. For example, ∼56 000 MITEs were identified in sorghum (Sorghum bicolor) (7), 73 500 in rice (Oryza sativa) (8) and 150 000 in human (9). Ever since their discovery almost 20 years ago (10,11), MITEs have been the subject of increasing interest in both plants and animals (12–15). Unlike the ‘traditional’ low copy non-autonomous TEs (such as the Ds element of maize), MITEs are uniformly short (most <500 bp) and amplify rapidly from one or a few elements to very high copy numbers (16). The two largest MITEs families, Stowaway and Tourist, were found to be members of the Tc1/Mariner and the PIF/Harbinger superfamilies, respectively (12,17–19). MITEs have also been reported from the hAT and MULE superfamilies (13,20).

While the rapidly expanding databases of genomic sequence present an opportunity to expand the study of MITEs, it also poses a significant challenge to their correct and efficient annotation. Many TE annotation programs have been developed that use one or more of the following computational approaches: (i) homology-based, (ii) de novo, (iii) polymorphism based and (iv) structure based (21–23). Homology-based TE annotation is powerful at detecting TEs that share sequence similarity with known elements, but it is inadequate at identifying full length or novel TEs. Methods using de novo approaches can discover all TEs as long as they have multiple copies. However, the drawback of this approach is that its output is a mixture of TEs from all superfamilies and non-TE repeats. As such, the manual identification and classification of TEs from the output of de novo methods is often very tedious and time consuming. Polymorphism-based approaches can discover new TEs but the output is also a mixture of different types of sequences. More importantly, its application is limited to the comparison of data sets from very closely related species. When compared to the other algorithms, structure-based approaches are very effective at discovering certain TE types like LTR retrotransposons. However, currently available programs are less successful at identifying other TE types like non-autonomous Class 2 transposons (including MITEs) because they possess few distinguishing structural features.

To date three programs have been developed exclusively to find MITEs: TRANSPO (24), FINDMITE (15) and MUST (25). TRANSPO is a homology-based program that requires known MITE sequences. As such it is not effective at finding new MITEs (21). FINDMITE and MUST are structure-based TE discovery programs that can be used to discover new MITEs because they search for common MITE structural features rather than similar sequences. However, because MITEs have only two common structural features, TIRs and target site duplications (TSDs), many sequences that are not MITEs are in the outputs of FINDMITE and MUST. Thus, the false-positive rates of these programs are very high and extensive manual curation is required to filter false-positives from their output files.

Here, we present MITE-Hunter, a program that accurately discovers MITEs as well as other short non-autonomous ‘cut-and-paste’ Class 2 TEs in genomic data sets including those of whole genomes. To evaluate MITE-Hunter, we compared it with FINDMITE and MUST. We chose the rice genome to evaluate the performance of MITE-Hunter because rice harbors abundant and well-annotated Class 2 TEs and MITEs (8,26,27). In the examples reported in this study, MITE-Hunter missed only two known rice MITEs and discovered 16 previously unknown elements. Compared to FINDMITE and MUST, MITE-Hunter has a much lower false-positive rate and the output is easier to be checked and classified. MITE-Hunter and related programs can be freely downloaded at http://target.iplantcollaborative.org/.

MATERIALS AND METHODS

The MITE-Hunter pipeline

MITE-Hunter is a UNIX program pipeline composed mainly of Perl scripts. Given genomic sequences as the input data, MITE-Hunter identifies Class 2 non-autonomous TEs and produces outputs of consensus sequences classified into families. MITE-Hunter can use multiple processers (default 5 CPUs). The MITE-Hunter pipeline has five main steps that are summarized in Figure 1: (i) identify TE candidates through a structure-based approach, (ii) identify and filter false-positives using an approach based on the pairwise sequence alignment (PSA), (iii) generate exemplars, (iv) identify and filter false-positives using an approach based on the multiple sequence alignment (MSA), generate consensus sequences and predict TSDs and (v) group consensus sequences into families. Details of each step are presented in the results section.

The five main steps of the MITE-Hunter pipeline. Gray bars are genomic sequences, black and red triangles are TSDs and TIRs, respectively, blue bars are predicted TEs, white bars are homolog sequences, dashed lines are gaps and yellow bars are sequences that are similar to each other but not to those represented by green bars (and vice versa). (A) Identification of candidate TEs. Three predicted candidate TEs are shown. (B) Filtering of false-positives based on the PSA. Four types of alignments are shown (a–d). Except for the candidates in (d), all the others are filtered as false-positives. (C) Selection of TE exemplars. (D) Filtering of false-positives based on the MSA, predicting TSDs and generating consensus sequences. (e) and (f) are two special types of MSA (see text for detail). (E) Selecting new exemplars and grouping TEs into families.

Figure 1.

The five main steps of the MITE-Hunter pipeline. Gray bars are genomic sequences, black and red triangles are TSDs and TIRs, respectively, blue bars are predicted TEs, white bars are homolog sequences, dashed lines are gaps and yellow bars are sequences that are similar to each other but not to those represented by green bars (and vice versa). (A) Identification of candidate TEs. Three predicted candidate TEs are shown. (B) Filtering of false-positives based on the PSA. Four types of alignments are shown (a–d). Except for the candidates in (d), all the others are filtered as false-positives. (C) Selection of TE exemplars. (D) Filtering of false-positives based on the MSA, predicting TSDs and generating consensus sequences. (e) and (f) are two special types of MSA (see text for detail). (E) Selecting new exemplars and grouping TEs into families.

Data set and Programs

The build five rice IRGSP/RAP genome sequence was used (28) as was Repbase version 14.02 (29) and RepeatMasker 3.26 (Smit, A.F.A., Hubley,R. and Green,P., unpublished data; http://www.repeatmasker.org). TE copy number was calculated using a previously described method (4). Pair-wise sequences alignment (PSA) used BLAST (30) and multiple sequences alignment (MSA) used Muscle (31). All computation was done on a Linux cluster.

RESULTS

MITE discovery in rice

We applied MITE-Hunter to the rice genome with default parameters. MITE-Hunter completed the analysis in ∼44 h. Details of the algorithms and results of each step of MITE-Hunter are presented below.

Accuracy evaluation of MITE-Hunter

To test the authenticity of the MITE-Hunter output we curated the 700 rice TEs (Figure 2). Each MSA file was manually analyzed for TIR and TSD structures that are characteristic of Class 2 TE superfamilies found in plant genomes. A TE is validated if it has at least three full-length copies and its ends, characterized by TIRs and TSDs, can be recognized from the MSA file. TEs that do not meet these criteria are considered to be false-positives. Using these strict parameters, we identified 46 false-positives. In addition, eight solo LTRs and four short Helitrons were identified and classified as false-positives. These 12 elements were in the MITE-Hunter output because they coincidentally have TIR-like and TSD-like structures near their ends. After removing these elements there were 642 TEs remaining from the original 700, resulting in a false-positive rate of 8.3% [(46 + 8 + 4)/700].

Flowchart of the manual curation of rice Class 2 non-autonomous TEs from MITE-Hunter output. The authentication process began with 700 consensus TEs and was reduced by the number shown for each step. The numbers on the right are the remaining consensus TEs after each step (see text for detail). Three different types of compound TEs are shown (a, b and c). Open and solid bars represent different TEs from different families. (a) One TE inserted into another. (b) Two different adjacent TEs. (c) Two adjacent copies from the same TE family.

Figure 2.

Flowchart of the manual curation of rice Class 2 non-autonomous TEs from MITE-Hunter output. The authentication process began with 700 consensus TEs and was reduced by the number shown for each step. The numbers on the right are the remaining consensus TEs after each step (see text for detail). Three different types of compound TEs are shown (a, b and c). Open and solid bars represent different TEs from different families. (a) One TE inserted into another. (b) Two different adjacent TEs. (c) Two adjacent copies from the same TE family.

Classification of TEs discovered by MITE-Hunter

In addition to 58 false-positives, we were unable to classify 15 TEs into superfamilies. Although these sequences appeared to be TEs (based on their MSA files), their TSDs and TIRs were ambiguous because they contained too many mismatches. As such, they were classified as unknowns.

The remaining 627 TEs were confirmed to be ‘cut-and-paste’ Class 2 TEs and were classified into previously described superfamilies. However, during the classification process we found that several families contain TEs belonging to more than one superfamily. By comparing their sequences, we discovered that this problem was caused by 14 compound TEs that were formed by the insertion of one superfamily member into another (Figure2-a). Because TEs were grouped into families based on their similarity, these 14 compound TEs drag TEs from different superfamilies together. In addition, we identified another 12 compound TEs that were formed by the fusion of two TEs from the same superfamily (Figure 2-b and -c). These 26 compound TEs have low full-length copy number in the genome and were excluded from the following analysis. Thus 601 TE consensus sequences remained.

Manual curation reveals that some TE consensus sequences in the MITE-Hunter output miss or have additional sequences at their ends. This problem is caused by the existence of false-TIR and TSD structures near the authentic ones. The missing or additional sequences are mostly short and can be manually identified after locating the real TIRs and TSDs in the MSA files. After correcting the consensus sequences of the remaining 601 Class 2 TEs (by adding or trimming the missing or additional sequence), the similarity between some TE sequences satisfies the grouping criteria in Step III (Figure 1C). As such we ran the programs in Step III and V of MITE-Hunter and got the final data set composed of 551 TE consensus sequences grouped into 401 families. Of these, 97 Tc1/Mariner TEs are grouped into 86 families, 146 PIF/Harbingers into 104 families, 123 hATs into 95 families, 173 Mutators into 110 families and 12 CACTAs into 6 families.

Identification of MITEs from MITE-Hunter output

To identify and characterize MITEs from MITE-Hunter output, we performed a RepeatMasker search of the rice genomic database using the curated 551 TE sequences as the query. From the RepeatMasker output, we counted the copy number of each TE (data not shown). To distinguish MITEs from lower copy Class 2 non-autonomous TEs, we defined a MITE as a Class 2 non-autonomous TE of <800 bp and with at least 100 full-length copies in the genome. Potential MITEs that have not experienced significant amplification were defined as having fewer copies (10–99) but high sequence identity (identity ≥99%). Based on these criteria, we identified 132 rice MITEs from the MITE-Hunter output, including 15 hAT-MITEs, 22 Mutator-MITEs, 50 Stowaways and 45 Tourists. No additional CACTA MITEs were found.

Comparison of MITE-Hunter output to Repbase data

To estimate the false-negative rate of MITE-Hunter we used the rice Class 2 non-autonomous elements in the Repbase as the reference data set. Repbase was selected for this analysis because it is a collective TE database containing most, if not all, previously reported rice Class 2 TEs (29). However, because Repbase contains both Class 1 and 2 autonomous and non-autonomous TEs, the first step was to retrieve only rice Class 2 non-autonomous elements. From these we then selected 230 elements that were <1.7 kb because the longest rice TE found by MITE-Hunter has 1676 bp. The 230 elements were manually checked using the same approach that was applied to the MITE-Hunter output. Thirty-two of the 230 elements were excluded because they lack multiple full-length copies. In addition, 13 were excluded because their TIR and TSD structures could not be identified from MSA files. The remaining 185 Repbase TEs were classified into Class 2 TE superfamilies. By using the same approach as was used for identifying MITEs from the MITE-Hunter output, we identified 101 MITE-like elements from the 185 Repbase TEs, including 4 hAT-MITEs, 19 Mutator-MITEs, 40 Stowaways and 38 Tourists.

The false-negative rates of MITE-Hunter were calculated separately for Class 2 non-autonomous TEs and MITEs as follows. First, we used the curated 551 Class 2 non-autonomous TEs discovered by MITE-Hunter as the query to mask the Repbase data set using RepeatMasker. On average, 84.9% of the sequences in the Repbase data set were masked (Table 1, second column). Using a similar approach, 97.6% of MITE sequences in the Repbase were masked by the TEs in the MITE-Hunter output (Table 1, third column). Thus the false-negative rate of MITE-Hunter is 15.1% for Class 2 non-autonomous TEs and 2.4% for MITEs. MITE-Hunter failed to identify only two Tourist MITEs (OSTE23 and ID-4) that were in Repbase. In contrast, using the data of the Repbase as the libraries, 47.9% of Class 2 non-autonomous TEs and 83.4% of MITEs in the MITE-Hunter output were masked (Table 1, the last two columns). Sixteen MITEs discovered by MITE-Hunter were not found in Repbase including 1 Tourist, 11 hAT-MITEs and 4 Mutator-MITEs.

Table 1.

Comparison between MITE-Hunter output and rice TEs in Repbase

Superfamily Repbase data masked by MITE-Hunter output (%) MITE-Hunter output masked by Repbase data (%)
Alla MITEs onlyb Allc MITEs onlyd
Tc1/Mariner 93.3 100.0 72.5 99.9
PIF/Harbinger 83.8 94.6 53.1 93.0
hAT 85.8 100.0 25.6 28.4
Mutator 81.0 99.3 49.5 80.0
CACTA 88.2 81.7
Together 84.9 97.6 47.9 83.4
Superfamily Repbase data masked by MITE-Hunter output (%) MITE-Hunter output masked by Repbase data (%)
Alla MITEs onlyb Allc MITEs onlyd
Tc1/Mariner 93.3 100.0 72.5 99.9
PIF/Harbinger 83.8 94.6 53.1 93.0
hAT 85.8 100.0 25.6 28.4
Mutator 81.0 99.3 49.5 80.0
CACTA 88.2 81.7
Together 84.9 97.6 47.9 83.4

a185 rice Class 2 non-autonomous TEs that are <1.7 kb in Repbase.

b101 MITEs identified and isolated from the data seta.

c551 Class 2 non-autonomous TE consensus sequences curated from the MITE-Hunter output.

d132 MITEs identified and isolated from the data setc.

Table 1.

Comparison between MITE-Hunter output and rice TEs in Repbase

Superfamily Repbase data masked by MITE-Hunter output (%) MITE-Hunter output masked by Repbase data (%)
Alla MITEs onlyb Allc MITEs onlyd
Tc1/Mariner 93.3 100.0 72.5 99.9
PIF/Harbinger 83.8 94.6 53.1 93.0
hAT 85.8 100.0 25.6 28.4
Mutator 81.0 99.3 49.5 80.0
CACTA 88.2 81.7
Together 84.9 97.6 47.9 83.4
Superfamily Repbase data masked by MITE-Hunter output (%) MITE-Hunter output masked by Repbase data (%)
Alla MITEs onlyb Allc MITEs onlyd
Tc1/Mariner 93.3 100.0 72.5 99.9
PIF/Harbinger 83.8 94.6 53.1 93.0
hAT 85.8 100.0 25.6 28.4
Mutator 81.0 99.3 49.5 80.0
CACTA 88.2 81.7
Together 84.9 97.6 47.9 83.4

a185 rice Class 2 non-autonomous TEs that are <1.7 kb in Repbase.

b101 MITEs identified and isolated from the data seta.

c551 Class 2 non-autonomous TE consensus sequences curated from the MITE-Hunter output.

d132 MITEs identified and isolated from the data setc.

Evaluation of FINDMITE and MUST

We tested the ability of two previously published MITE finding programs, FINDMITE and MUST, to discover MITEs in the rice genomic data set using default parameters. Importantly, when we attempted to use the entire genomic sequence (∼372.8 Mb) as the input data, both FINDMITE and MUST reported errors and quit. As such we applied FINDMITE and MUST to a much smaller data set, rice chromosome 12 (∼28.2 Mb) (Table 2). MUST completed the task in ∼5 h and 30 min and generated 5485 putative TE sequences. Because FINDMITE requires users to define the TSD sequence and length, we chose ‘TA’, which is the TSD sequence of Stowaway MITEs. FINDMITE finished in <1 min and generated 10 864 putative Stowaways. To calculate the false-positive rate, we randomly sampled 100 TE sequences from the outputs of FINDMITE and MUST, respectively, and checked them using the same approach as was used for evaluating MITE-Hunter. With only 15 and 14 validated TEs for FINDMITE and MUST, respectively, both programs have a false-positive rate of over 80%. To perform an impartial comparison, we also applied MITE-Hunter to the rice chromosome 12 data set. Using default parameters, MITE-Hunter finished in 1 h and 40 min and generated 114 TE consensus sequences that were grouped into 88 families. Through manual curation, five TEs were identified as false-positives resulting in a false-positive rate of 4.4%. Because the input data is a small subset of the rice genome, we did not compare the results of FINDMITE and MUST to the Repbase data to calculate the false-negative rate.

Table 2.

Comparisons of MITE-Hunter with FINDMITE and MUST

Program Running timea Predicted TEs False-positives (%)
MITE-Hunter 1.7 h 114 4.4
FINDMITEb <1 min 10 864 85.0
MUST 5.5 h 5485 86.0
Program Running timea Predicted TEs False-positives (%)
MITE-Hunter 1.7 h 114 4.4
FINDMITEb <1 min 10 864 85.0
MUST 5.5 h 5485 86.0

aRice chromosome 12 was used as the input data (∼28.2 Mb).

bParameters were set to find only Stowaway MITEs.

Table 2.

Comparisons of MITE-Hunter with FINDMITE and MUST

Program Running timea Predicted TEs False-positives (%)
MITE-Hunter 1.7 h 114 4.4
FINDMITEb <1 min 10 864 85.0
MUST 5.5 h 5485 86.0
Program Running timea Predicted TEs False-positives (%)
MITE-Hunter 1.7 h 114 4.4
FINDMITEb <1 min 10 864 85.0
MUST 5.5 h 5485 86.0

aRice chromosome 12 was used as the input data (∼28.2 Mb).

bParameters were set to find only Stowaway MITEs.

DISCUSSION

A necessary prerequisite for the comprehensive analysis of MITEs is their identification in newly sequenced genomes. Two programs were previously developed for this purpose, FINDMITE and MUST. However, as demonstrated in this study, both FINDMITE and MUST have very high false-positive rates (∼85%) and cannot efficiently utilize whole genomic data sets like that from rice. To remedy this situation, we developed MITE-Hunter, which is a structure-based program pipeline that can efficiently identify TEs that have TIR and TSD structures from whole genome data sets. Important features of MITE-Hunter are discussed below.

MITE-Hunter has an efficient approach to reduce the high false-positive rate, which is the main limitation of currently available MITE discovery programs. The vast majority of rice genomic sequences with TIR-like and TSD-like structures are not Class 2 TEs. MITE-Hunter has two modules to filter false-positives, that both exploit the principle that homologs of a true TE only share sequence similarity within the terminal structures. The main difference between the two modules is that one detects sequence similarity through the PSA approach while the other uses the MSA approach. The MSA-based module is more powerful at identifying false-positives but it is slower than the PSA-based module. To achieve both high speed and high sensitivity, the PSA-based module is first performed in Step II to filter most of the false-positives while the MSA-based module is performed in Step IV to filter the remaining false-positives. Because MITE-Hunter has such a system to identify and filter artificial TE candidates, the false-positive rate of MITE-Hunter (4.4–8.3%) is ten times lower than either FINDMITE (85%) or MUST (86%).

MITE-Hunter is competent at discovering Class 2 non-autonomous TEs especially MITEs. In our test, MITE-Hunter rediscovered most of the known rice Class 2 non-autonomous TEs (85%) and almost all MITEs (97.6%) in Repbase [Table 1, second and third columns]. Only two MITEs (OSTE23 and ID-4) in Repbase were missed by MITE-Hunter. OSTE23 is a very old MITE family and its TIR and TSD structures are difficult to detect even by manual examination of the MSA file. ID-4 has two mismatches in the TIRs that were not identified in Step I of MITE-Hunter.

Compared to other MITE discovery programs, the MITE-Hunter output is much easier to curate manually. First, the number of TEs in the MITE-Hunter output is very small because MITE-Hunter generates consensus sequences that best represent the whole TE data set of the genome being analyzed. As shown in the results section, MITE Hunter generated 700 consensus TEs from the entire rice genomic data set. In contrast, FINDMITE generated ∼10 000 putative Stowaway MITEs using only the smallest rice chromosome (#12) as the input data set. Using the same data set MUST generated about 5000 elements. Second, for each TE sequence in its output, MITE-Hunter generates a MSA file and predicts TSDs, which are useful for both TE validation and classification. The validity of each TE discovered by MITE-Hunter can be determined by identifying TIRs and TSDs from the MSA file by manual inspection. Finally, in the output of MITE-Hunter, identified TEs are automatically grouped into families based on the sequence similarity, which further helps manual curation by users. These features are of value to all users, especially those who need a TE data set that is 100% accurate and is classified into superfamilies

In summary, MITE-Hunter is the first program to efficiently and accurately identify MITEs from whole genome sequence. Whereas the rice Class 2 non-autonomous TEs in Repbase were the products of many studies, MITE-Hunter was able to find virtually all the MITEs in a relatively short time frame and to do so accurately. Finally, the MITE-Hunter output is easy to curate as it contains highly condensed TE consensus sequences that are grouped into families. The validity of a TE discovered by MITE-Hunter can be quickly judged from the automatically generated MSA file, which is, to our knowledge, a unique feature of MITE-Hunter.

FUNDING

The National Science Foundation (NSF) plant genome (0607123 to S.R.W.). Funding for open access charge: The NSF plant genome grant 0607123.

Conflict of interest statement. None declared.

ACKNOWLEDGEMENTS

We thank Yaowu Yuan for valuable discussions of both of the programs and the article. We thank Hao Wang for installing and running MUST.

REFERENCES

1

, , , , , , , , , , et al.

The dog genome: survey sequencing and comparative analysis

,

Science

,

2003

, vol.

301

(pg.

1898

-

1903

)

2

, , , , , , , , , , et al.

Initial sequencing and comparative analysis of the mouse genome

,

Nature

,

2002

, vol.

420

(pg.

520

-

562

)

3

, , , , , , , , , , et al.

Initial sequencing and analysis of the human genome

,

Nature

,

2001

, vol.

409

(pg.

860

-

921

)

4

, , , , , , , , , , et al.

The B73 maize genome: complexity, diversity, and dynamics

,

Science

,

2009

, vol.

326

(pg.

1112

-

1115

)

5

, , , , , , , , , , et al.

A unified classification system for eukaryotic transposable elements

,

Nat. Rev. Genet.

,

2007

, vol.

8

(pg.

973

-

982

)

6

, .

DNA transposons and the evolution of eukaryotic genomes

,

Annu. Rev. Genet.

,

2007

, vol.

41

(pg.

331

-

368

)

7

, , , , , , , , , , et al.

The Sorghum bicolor genome and the diversification of grasses

,

Nature

,

2009

, vol.

457

(pg.

551

-

556

)

8

, , , , , .

A genome-wide view of miniature inverted-repeat transposable elements (MITEs) in rice, Oryza sativa ssp. japonica

,

Genes Genet. Syst.

,

2008

, vol.

83

(pg.

321

-

329

)

9

, .

Tiggers and DNA transposon fossils in the human genome

,

Proc. Natl Acad. Sci. USA

,

1996

, vol.

93

(pg.

1443

-

1448

)

10

, .

Stowaway: a new family of inverted repeat elements associated with the genes of both monocotyledonous and dicotyledonous plants

,

Plant Cell

,

1994

, vol.

6

(pg.

907

-

916

)

11

, .

Tourist: a large family of small inverted repeat elements frequently associated with maize genes

,

Plant Cell

,

1992

, vol.

4

(pg.

1283

-

1294

)

12

, , , , .

Tuned for transposition: molecular determinants underlying the hyperactivity of a Stowaway MITE

,

Science

,

2009

, vol.

325

(pg.

1391

-

1394

)

13

, , , , , , , , .

Identification of miniature inverted-repeat transposable elements (MITEs) and biogenesis of their siRNAs in the Solanaceae: new functional implications for MITEs

,

Genome Res.

,

2009

, vol.

19

(pg.

42

-

56

)

14

, , , .

Identification and characterisation of five novel miniature inverted-repeat transposable elements (MITEs) in amphioxus (Branchiostoma floridae)

,

Int. J. Biol. Sci.

,

2006

, vol.

2

(pg.

54

-

60

)

15

.

Eight novel families of miniature inverted repeat transposable elements in the African malaria mosquito, Anopheles gambiae

,

Proc. Natl Acad. Sci. USA

,

2001

, vol.

98

(pg.

1699

-

1704

)

16

, , , , , , , .

Dramatic amplification of a rice transposable element during recent domestication

,

Proc. Natl Acad. Sci. USA

,

2006

, vol.

103

(pg.

17620

-

17625

)

17

, , , , , , .

An active DNA transposon family in rice

,

Nature

,

2003

, vol.

421

(pg.

163

-

167

)

18

, , .

Genome-wide analysis of mariner-like transposable elements in rice reveals complex relationships with stowaway miniature inverted repeat transposable elements (MITEs)

,

Genetics

,

2003

, vol.

163

(pg.

747

-

758

)

19

, , , , , .

P instability factor: an active maize transposon system associated with the amplification of Tourist-like MITEs and a new superfamily of transposases

,

Proc. Natl Acad. Sci. USA

,

2001

, vol.

98

(pg.

12572

-

12577

)

20

, , .

hATpin, a family of MITE-like hAT mobile elements conserved in diverse plant species that forms highly stable secondary structures

,

Plant Mol. Biol.

,

2005

, vol.

58

(pg.

869

-

886

)

21

.

Identifying repeats and transposable elements in sequenced genomes: how to find your way through the dense forest of programs

,

Heredity

,

2009

, vol.

104

(pg.

520

-

533

)

22

, , , .

Computational approaches and tools used in identification of dispersed repetitive DNA sequences

,

Tropical Plant Biol.

,

2008

, vol.

1

(pg.

85

-

96

)

23

, .

Discovering and detecting transposable elements in genome sequences

,

Brief Bioinform.

,

2007

, vol.

8

(pg.

382

-

392

)

24

, , , , .

Genome-wide analysis of the Emigrant family of MITEs of Arabidopsis thaliana

,

Mol. Biol. Evol.

,

2002

, vol.

19

(pg.

2285

-

2293

)

25

, , , .

MUST: a system for identification of miniature inverted-repeat transposable elements and applications to Anabaena variabilis and Haloquadratum walsbyi

,

Gene

,

2009

, vol.

436

(pg.

1

-

7

)

26

, , , .

Using rice to understand the origin and amplification of miniature inverted repeat transposable elements (MITEs)

,

Curr. Opin. Plant Biol.

,

2004

, vol.

7

(pg.

115

-

119

)

27

, , .

A computer-based systematic survey reveals the predominance of small inverted-repeat elements in wild-type rice genes

,

Proc. Natl Acad. Sci. USA

,

1996

, vol.

93

(pg.

8524

-

8529

)

28

, , , , , , , , , , et al.

The Rice Annotation Project Database (RAP-DB): 2008 update

,

Nucleic Acids Res.

,

2008

, vol.

36

(pg.

D1028

-

1033

)

29

, , , , , .

Repbase Update, a database of eukaryotic repetitive elements

,

Cytogenet. Genome Res.

,

2005

, vol.

110

(pg.

462

-

467

)

30

, , , , , , .

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs

,

Nucleic Acids Res.

,

1997

, vol.

25

(pg.

3389

-

3402

)

31

.

MUSCLE: multiple sequence alignment with high accuracy and high throughput

,

Nucleic Acids Res.

,

2004

, vol.

32

(pg.

1792

-

1797

)

32

, , .

TARGeT: a web-based pipeline for retrieving and characterizing gene and transposable element families from genomic sequences

,

Nucleic Acids Res.

,

2009

, vol.

37

pg.

e78

© The Author(s) 2010. Published by Oxford University Press.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

I agree to the terms and conditions. You must accept the terms and conditions.

Submit a comment

Name

Affiliations

Comment title

Comment

You have entered an invalid code

Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.

Citations

Views

Altmetric

Metrics

Total Views 5,875

4,788 Pageviews

1,087 PDF Downloads

Since 12/1/2016

Month: Total Views:
December 2016 5
January 2017 10
February 2017 34
March 2017 46
April 2017 19
May 2017 27
June 2017 32
July 2017 21
August 2017 11
September 2017 22
October 2017 25
November 2017 21
December 2017 38
January 2018 86
February 2018 68
March 2018 162
April 2018 115
May 2018 121
June 2018 58
July 2018 83
August 2018 75
September 2018 62
October 2018 111
November 2018 110
December 2018 87
January 2019 100
February 2019 89
March 2019 99
April 2019 312
May 2019 120
June 2019 98
July 2019 78
August 2019 93
September 2019 78
October 2019 58
November 2019 43
December 2019 29
January 2020 37
February 2020 44
March 2020 40
April 2020 28
May 2020 33
June 2020 30
July 2020 68
August 2020 89
September 2020 56
October 2020 46
November 2020 48
December 2020 29
January 2021 36
February 2021 18
March 2021 56
April 2021 60
May 2021 81
June 2021 53
July 2021 40
August 2021 45
September 2021 33
October 2021 46
November 2021 39
December 2021 58
January 2022 53
February 2022 82
March 2022 61
April 2022 65
May 2022 76
June 2022 58
July 2022 53
August 2022 48
September 2022 80
October 2022 54
November 2022 68
December 2022 52
January 2023 43
February 2023 51
March 2023 70
April 2023 55
May 2023 49
June 2023 75
July 2023 56
August 2023 58
September 2023 55
October 2023 56
November 2023 69
December 2023 78
January 2024 86
February 2024 65
March 2024 121
April 2024 70
May 2024 98
June 2024 52
July 2024 66
August 2024 54
September 2024 39

Citations

370 Web of Science

×

Email alerts

Citing articles via

More from Oxford Academic