TREP is a curated database of transposable elements (TEs).
Transposable elements are mobile genomic DNA sequences found in nearly all organisms. TEs have the ability to replicate in a host genome using various transposition mechanisms and they are divided into two classes based on their replication mechanism. Retrotransposons (class I) use an RNA intermediate for transposition ("copy and paste") while DNA transposons (class II) use a DNA intermediate for transposition ("cut and paste").
Mobile genetic elements were first discovered in maize plants by the geneticist Barbara McClintock in the mid-1940s. Today, we know that transposable elements constitute large fractions of the genomic DNA in many higher eukaryotes. Once considered as genomic parasites ("selfish DNA", "junk DNA") there is increasing evidence that TEs play a far-reaching role in genome shaping and evolution.
A well-defined, consistent TE classification system is a prerequisite to identify, classify and characterize the hundreds to thousands of different TE families forming the majority of genomic DNA in many organisms. TREP incorporates the unified classification system for eukaryotic TEs proposed by Wicker et al., 2007.
Originally, the TREP database was initiated to compile transposable elements identified in Triticeae genomic or cDNA sequences to ease work with these highly repetitive genomes (> 80% TEs in wheat, barley, maize genomes; Wicker et al., 2002). Over the years, TEs from various other species were included, making use of the ever increasing avalanche of sequencing data.
The TREP database is divided into a complete and a non-redundant nucleotide database. Additionally, a database of hypothetical proteins is deduced from the non-redundant nucleotide database.
The complete nucleotide database ("total_TREP") comprises all TE entries and facilitates in-depth studies of the different TE classes.
The non-redundant nucleotide database ("nrTREP") contains consensus sequences of the different types of TEs to make BLAST searches more efficient.
The database of hypothetical proteins ("PTREP") contains deduced amino acid sequences. In the deduction of hypothetical proteins, frameshifts were removed in many cases. PTREP is useful for the identification of divergent TEs having no significant similarity at the DNA level.
Well-classified and annotated TE libraries facilitate the rapid identification of TEs in genomic and cDNA sequences. TE libraries allow masking repeats in sequences for further analysis of the non-repetitive sequence space.
The repetitive nature of TEs poses great challenges in sequencing and analysis of large genomes. The sequence assembly of large genomic DNA sequences may benefit from the TREP database by reducing misalignments, in closing gaps and arranging subcontigs into a contiguous sequence.
TE libraries ease the discovery of new families of TEs and enable systematic comparisons of TEs from varying sources. They may contribute to a better understanding of the evolution and possible functions of these elements.
BLAST searches can be carried out against all three TREP databases ("nrTREP", "total_TREP", "PTREP"). As a unique feature, TE sequences corresponding to significant BLAST search hits can be downloaded in FASTA format by selection as single sequences or in groups of sequences.
Annotated TREP database files can be downloaded in FASTA format for local use.