首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Combined evidence annotation of transposable elements in genome sequences   总被引:1,自引:0,他引:1  
Transposable elements (TEs) are mobile, repetitive sequences that make up significant fractions of metazoan genomes. Despite their near ubiquity and importance in genome and chromosome biology, most efforts to annotate TEs in genome sequences rely on the results of a single computational program, RepeatMasker. In contrast, recent advances in gene annotation indicate that high-quality gene models can be produced from combining multiple independent sources of computational evidence. To elevate the quality of TE annotations to a level comparable to that of gene models, we have developed a combined evidence-model TE annotation pipeline, analogous to systems used for gene annotation, by integrating results from multiple homology-based and de novo TE identification methods. As proof of principle, we have annotated "TE models" in Drosophila melanogaster Release 4 genomic sequences using the combined computational evidence derived from RepeatMasker, BLASTER, TBLASTX, all-by-all BLASTN, RECON, TE-HMM and the previous Release 3.1 annotation. Our system is designed for use with the Apollo genome annotation tool, allowing automatic results to be curated manually to produce reliable annotations. The euchromatic TE fraction of D. melanogaster is now estimated at 5.3% (cf. 3.86% in Release 3.1), and we found a substantially higher number of TEs (n = 6,013) than previously identified (n = 1,572). Most of the new TEs derive from small fragments of a few hundred nucleotides long and highly abundant families not previously annotated (e.g., INE-1). We also estimated that 518 TE copies (8.6%) are inserted into at least one other TE, forming a nest of elements. The pipeline allows rapid and thorough annotation of even the most complex TE models, including highly deleted and/or nested elements such as those often found in heterochromatic sequences. Our pipeline can be easily adapted to other genome sequences, such as those of the D. melanogaster heterochromatin or other species in the genus Drosophila.  相似文献   

2.
Lerat E  Burlet N  Biémont C  Vieira C 《Gene》2011,473(2):100-109
Transposable elements (TEs) are indwelling components of genomes, and their dynamics have been a driving force in genome evolution. Although we now have more information concerning their amounts and characteristics in various organisms, we still have little data from overall comparisons of their sequences in very closely-related species. While the Drosophila melanogaster genome has been extensively studied, we have only limited knowledge regarding the precise TE sequences in the genomes of the related species Drosophila simulans, Drosophila sechellia and Drosophila yakuba. In this study we analyzed the number and structure of TE copies in the sequenced genomes of these four species. Our findings show that, unexpectedly, the number of TE insertions in D. simulans is greater than that in D. melanogaster, but that most of the copies in D. simulans are degraded and in small fragments, as in D. sechellia and D. yakuba. This suggests that all three species were invaded by numerous TEs a long time ago, but have since regulated their activity, as the present TE copies are degraded, with very few full-length elements. In contrast, in D. melanogaster, a recent activation of TEs has resulted in a large number of almost-identical TE copies. We have detected variants of some TEs in D. simulans and D. sechellia, that are almost identical to the reference TE sequences in D. melanogaster, suggesting that D. melanogaster has recently been invaded by active TE variants from the other species. Our results indicate that the three species D. simulans, D. sechellia, and D. yakuba seem to be at a different stage of their TE life cycle when compared to D. melanogaster. Moreover, we show that D. melanogaster has been invaded by active TE variants for several TE families likely to come from D. simulans or the ancestor of D. simulans and D. sechellia. The numerous horizontal transfer events implied to explain these results could indicate introgression events between these species.  相似文献   

3.
The Drosophila melanogaster genome contains approximately 100 distinct families of transposable elements (TEs). In the euchromatic part of the genome, each family is present in a small number of copies (5-150 copies), with individual copies of TEs often present at very low frequencies in populations. This pattern is likely to reflect a balance between the inflow of TEs by transposition and the removal of TEs by natural selection. The nature of natural selection acting against TEs remains controversial. We provide evidence that selection against chromosome abnormalities caused by ectopic recombination limits the spread of some TEs. We also demonstrate for the first time that some TE families in the Drosophila euchromatin appear to be only marginally affected by purifying selection and contain many copies at high population frequencies. We argue that TEs in these families attain high population frequencies and even reach fixation as a result of low family-wide transposition rates leading to low TE copy numbers and consequently reduced strength of selection acting on individual TE copies. Fixation of TEs in these families should provide an upward pressure on the size of intergenic sequences counterbalancing rapid DNA loss through small deletions. Copy-number-dependent selection on TE families caused by ectopic recombination may also promote diversity among TEs in the Drosophila genome.  相似文献   

4.
5.
Transposable elements (TEs) are mobile, repetitive DNA sequences that are almost ubiquitous in prokaryotic and eukaryotic genomes. They have a large impact on genome structure, function and evolution. With the recent development of high-throughput sequencing methods, many genome sequences have become available, making possible comparative studies of TE dynamics at an unprecedented scale. Several methods have been proposed for the de novo identification of TEs in sequenced genomes. Most begin with the detection of genomic repeats, but the subsequent steps for defining TE families differ. High-quality TE annotations are available for the Drosophila melanogaster and Arabidopsis thaliana genome sequences, providing a solid basis for the benchmarking of such methods. We compared the performance of specific algorithms for the clustering of interspersed repeats and found that only a particular combination of algorithms detected TE families with good recovery of the reference sequences. We then applied a new procedure for reconciling the different clustering results and classifying TE sequences. The whole approach was implemented in a pipeline using the REPET package. Finally, we show that our combined approach highlights the dynamics of well defined TE families by making it possible to identify structural variations among their copies. This approach makes it possible to annotate TE families and to study their diversification in a single analysis, improving our understanding of TE dynamics at the whole-genome scale and for diverse species.  相似文献   

6.
N. Harden  M. Ashburner 《Genetics》1990,126(2):387-400
FB-NOF is a composite transposable element of Drosophila melanogaster. It is composed of foldback sequences, of variable length, which flank a 4-kb NOF sequence with 308-bp inverted repeat termini. The NOF sequence could potentially code for a 120-kD polypeptide. The FB-NOF element is responsible for unstable mutations of the white gene (wc and wDZL) and is associated with the large TEs of G. Ising. Although most strains of D. melanogaster have 20-30 sites of FB insertion, FB-NOF elements are usually rare, many strains lack this composite element or have only one copy of it. A few strains, including wDZL and Basc have many (8-21) copies of FB-NOF, and these show a tendency to insert at "hot-spots." These strains also have an increased number of FB elements. The DNA sequence of the NOF region associated with TE146(Z) has been determined.  相似文献   

7.
Buisine N  Quesneville H  Colot V 《Genomics》2008,91(5):467-475
Transposable elements (TEs) are ubiquitous components of eukaryotic genomes that impact many aspects of genome function. TE detection in genomic sequences is typically performed using similarity searches against a set of reference sequences built from previously identified TEs. Here, we demonstrate that this process can be improved by designing reference sets that incorporate key aspects of the structure and evolution of TEs and by combining these sets with Repbase Update (RU), which is composed mainly of consensus sequences. Using the Arabidopsis genome as a test case, our approach leads to the detection of an extra 12.4% of TE sequences. These correspond to novel TE fragments as well as to the extension of TE fragments already detected by RU. Significantly, we find that TE detection could be readily optimized using only two reference sets, one containing true consensus sequences and the other mosaic sequences that capture the structural diversity of TE copies within a family.  相似文献   

8.
Transposable elements (TEs) are the primary contributors to the genome bulk in many organisms and are major players in genome evolution. A clear and thorough understanding of the population dynamics of TEs is therefore essential for full comprehension of the eukaryotic genome evolution and function. Although TEs in Drosophila melanogaster have received much attention, population dynamics of most TE families in this species remains entirely unexplored. It is not clear whether the same population processes can account for the population behaviors of all TEs in Drosophila or whether, as has been suggested previously, different orders behave according to very different rules. In this work, we analyzed population frequencies for a large number of individual TEs (755 TEs) in five North American and one sub-Saharan African D. melanogaster populations (75 strains in total). These TEs have been annotated in the reference D. melanogaster euchromatic genome and have been sampled from all three major orders (non-LTR, LTR, and TIR) and from all families with more than 20 TE copies (55 families in total). We find strong evidence that TEs in Drosophila across all orders and families are subject to purifying selection at the level of ectopic recombination. We showed that strength of this selection varies predictably with recombination rate, length of individual TEs, and copy number and length of other TEs in the same family. Importantly, these rules do not appear to vary across orders. Finally, we built a statistical model that considered only individual TE-level (such as the TE length) and family-level properties (such as the copy number) and were able to explain more than 40% of the variation in TE frequencies in D. melanogaster.  相似文献   

9.
Transposable elements (TEs) are repetitive DNA sequences that are ubiquitous, extremely abundant and dynamic components of practically all genomes. Much effort has gone into annotation of TE copies in reference genomes. The sequencing cost reduction and the newly available next-generation sequencing (NGS) data from multiple strains within a species offer an unprecedented opportunity to study population genomics of TEs in a range of organisms. Here, we present a computational pipeline (T-lex) that uses NGS data to detect the presence/absence of annotated TE copies. T-lex can use data from a large number of strains and returns estimates of population frequencies of individual TE insertions in a reasonable time. We experimentally validated the accuracy of T-lex detecting presence or absence of 768 previously identified TE copies in two resequenced Drosophila melanogaster strains. Approximately 95% of the TE insertions were detected with 100% sensitivity and 97% specificity. We show that even at low levels of coverage T-lex produces accurate results for TE copies that it can identify reliably but that the rate of 'no data' calls increases as the coverage falls below 15×. T-lex is a broadly applicable and flexible tool that can be used in any genome provided the availability of the reference genome, individual TE copy annotation and NGS data.  相似文献   

10.
Triticeae species (including wheat, barley and rye) have huge and complex genomes due to polyploidization and a high content of transposable elements (TEs). TEs are known to play a major role in the structure and evolutionary dynamics of Triticeae genomes. During the last 5 years, substantial stretches of contiguous genomic sequence from various species of Triticeae have been generated, making it necessary to update and standardize TE annotations and nomenclature. In this study we propose standard procedures for these tasks, based on structure, nucleic acid and protein sequence homologies. We report statistical analyses of TE composition and distribution in large blocks of genomic sequences from wheat and barley. Altogether, 3.8 Mb of wheat sequence available in the databases was analyzed or re-analyzed, and compared with 1.3 Mb of re-annotated genomic sequences from barley. The wheat sequences were relatively gene-rich (one gene per 23.9 kb), although wheat gene-derived sequences represented only 7.8% (159 elements) of the total, while the remainder mainly comprised coding sequences found in TEs (54.7%, 751 elements). Class I elements [mainly long terminal repeat (LTR) retrotransposons] accounted for the major proportion of TEs, in terms of sequence length as well as element number (83.6% and 498, respectively). In addition, we show that the gene-rich sequences of wheat genome A seem to have a higher TE content than those of genomes B and D, or of barley gene-rich sequences. Moreover, among the various TE groups, MITEs were most often associated with genes: 43.1% of MITEs fell into this category. Finally, the TRIM and copia elements were shown to be the most active TEs in the wheat genome. The implications of these results for the evolution of diploid and polyploid wheat species are discussed. Electronic Supplementary Material Supplementary material is available for this article at  相似文献   

11.
12.
Transposable elements (TEs) are mobile genetic elements that parasitize genomes by semi-autonomously increasing their own copy number within the host genome. While TEs are important for genome evolution, appropriate methods for performing unbiased genome-wide surveys of TE variation in natural populations have been lacking. Here, we describe a novel and cost-effective approach for estimating population frequencies of TE insertions using paired-end Illumina reads from a pooled population sample. Importantly, the method treats insertions present in and absent from the reference genome identically, allowing unbiased TE population frequency estimates. We apply this method to data from a natural Drosophila melanogaster population from Portugal. Consistent with previous reports, we show that low recombining genomic regions harbor more TE insertions and maintain insertions at higher frequencies than do high recombining regions. We conservatively estimate that there are almost twice as many "novel" TE insertion sites as sites known from the reference sequence in our population sample (6,824 novel versus 3,639 reference sites, with on average a 31-fold coverage per insertion site). Different families of transposable elements show large differences in their insertion densities and population frequencies. Our analyses suggest that the history of TE activity significantly contributes to this pattern, with recently active families segregating at lower frequencies than those active in the more distant past. Finally, using our high-resolution TE abundance measurements, we identified 13 candidate positively selected TE insertions based on their high population frequencies and on low Tajima's D values in their neighborhoods.  相似文献   

13.
Transposable elements (TEs) are a major source of genetic variability in genomes, creating genetic novelty and driving genome evolution. Analysis of sequenced genomes has revealed considerable diversity in TE families, copy number, and localization between different, closely related species. For instance, although the twin species Drosophila melanogaster and D. simulans share the same TE families, they display different amounts of TEs. Furthermore, previous analyses of wild type derived strains of D. simulans have revealed high polymorphism regarding TE copy number within this species. Several factors may influence the diversity and abundance of TEs in a genome, including molecular mechanisms such as epigenetic factors, which could be a source of variation in TE success. In this paper, we present the first analysis of the epigenetic status of four TE families (roo, tirant, 412 and F) in seven wild type strains of D. melanogaster and D. simulans. Our data shows intra- and inter-specific variations in the histone marks that adorn TE copies. Our results demonstrate that the chromatin state of common TEs varies among TE families, between closely related species and also between wild type strains.  相似文献   

14.
We present a global analysis of the distribution of 43 transposable elements (TEs) in 228 species of the Drosophila genus from our data and data from the literature. Data on chromosome localization come from in situ hybridization and presence/absence of the elements from southern analyses. This analysis shows great differences between TE distributions, even among closely related species. Some TEs are distributed according to the phylogeny of their host specie; others do not entirely follow the phylogeny, suggesting horizontal transfers. A higher number of insertion sites for most TEs in the genome of D. melanogaster is observed when compared with that in D. simulans. This suggests either intrinsic differences in genomic characteristics between the two species, or the influence of differing effective population sizes, although biases due to the use of TE probes coming mostly from D. melanogaster and to the way TEs are initially detected in species cannot be ruled out. Data on TEs more specific to the species under consideration are necessary for a better understanding of their distribution in organisms and populations. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

15.
16.
The stable coexistence of transposable elements (TEs) with their host genome over long periods of time suggests TEs have to impose some deleterious effect upon their host fitness. Three mechanisms have been proposed to account for the deleterious effect caused by TEs: host gene interruptions by TE insertions, chromosomal rearrangements by TE-induced ectopic recombination, and costly TE expression. However, the relative importance of these mechanisms remains controversial. Here, we test specifically if TE expression accounts for the host fitness cost imposed by TE insertions. In the retrotransposon Doc, expression requires binding of the host RNA polymerase to the internal promoter. If expression of Doc elements is deleterious to their host, Doc copies with promoters would be more strongly selected against and would persist in the population for shorter periods of time compared with Docs lacking promoters. We tested this prediction using sequence-specific amplified polymorphism (SSAP) analyses. We compared the populations of these two types of Doc elements in two sets of lines of Drosophila melanogaster: selection-free isogenic lines accumulating new Doc insertions and isogenized isofemale lines sampled from a natural population. We found that (1) there is no difference in the proportion of promoter-bearing and promoter-lacking copies between sets of lines, and (2) the site occupancy distribution of promoter-bearing copies does not skew toward lower frequency compared with that of promoter-lacking copies. Thus, selection against promoter-bearing copies does not appear to be stronger than that of promoter-lacking copies. Our results show that expression is not playing a major role in stabilizing Doc copy numbers.  相似文献   

17.
T. E. Kijima  Hideki Innan 《Genetics》2013,195(3):957-967
A population genetic simulation framework is developed to understand the behavior and molecular evolution of DNA sequences of transposable elements. Our model incorporates random transposition and excision of transposable element (TE) copies, two modes of selection against TEs, and degeneration of transpositional activity by point mutations. We first investigated the relationships between the behavior of the copy number of TEs and these parameters. Our results show that when selection is weak, the genome can maintain a relatively large number of TEs, but most of them are less active. In contrast, with strong selection, the genome can maintain only a limited number of TEs but the proportion of active copies is large. In such a case, there could be substantial fluctuations of the copy number over generations. We also explored how DNA sequences of TEs evolve through the simulations. In general, active copies form clusters around the original sequence, while less active copies have long branches specific to themselves, exhibiting a star-shaped phylogeny. It is demonstrated that the phylogeny of TE sequences could be informative to understand the dynamics of TE evolution.  相似文献   

18.
High-throughput DNA sequencing technologies have revolutionized genomic analysis, including the de novo assembly of whole genomes. Nevertheless, assembly of complex genomes remains challenging, in part due to the presence of dispersed repeats which introduce ambiguity during genome reconstruction. Transposable elements (TEs) can be particularly problematic, especially for TE families exhibiting high sequence identity, high copy number, or complex genomic arrangements. While TEs strongly affect genome function and evolution, most current de novo assembly approaches cannot resolve long, identical, and abundant families of TEs. Here, we applied a novel Illumina technology called TruSeq synthetic long-reads, which are generated through highly-parallel library preparation and local assembly of short read data and which achieve lengths of 1.5–18.5 Kbp with an extremely low error rate (0.03% per base). To test the utility of this technology, we sequenced and assembled the genome of the model organism Drosophila melanogaster (reference genome strain y; cn, bw, sp) achieving an N50 contig size of 69.7 Kbp and covering 96.9% of the euchromatic chromosome arms of the current reference genome. TruSeq synthetic long-read technology enables placement of individual TE copies in their proper genomic locations as well as accurate reconstruction of TE sequences. We entirely recovered and accurately placed 4,229 (77.8%) of the 5,434 annotated transposable elements with perfect identity to the current reference genome. As TEs are ubiquitous features of genomes of many species, TruSeq synthetic long-reads, and likely other methods that generate long-reads, offer a powerful approach to improve de novo assemblies of whole genomes.  相似文献   

19.
Maisonhaute C  Ogereau D  Hua-Van A  Capy P 《Gene》2007,393(1-2):116-126
Transposable elements (TEs), represent a large fraction of the eukaryotic genome. In Drosophila melanogaster, about 20% of the genome corresponds to such middle repetitive DNA dispersed sequences. A fraction of TEs is composed of elements showing a retrovirus-like structure, the LTR-retrotransposons, the first TEs to be described in the Drosophila genome. Interestingly, in D. melanogaster embryonic immortal cell culture genomes the copy number of these LTR-retrotransposons was revealed to be higher than the copy number in the Drosophila genome, presumably as the result of transposition of some copies to new genomic locations [Potter, S.S., Brorein Jr., W.J., Dunsmuir, P., Rubin, G.M., 1979. Transposition of elements of the 412, copia and 297 dispersed repeated gene families in Drosophila. Cell 17, 415-427; Junakovic, N., Di Franco, C., Best-Belpomme, M., Echalier, G., 1988. On the transposition of copia-like nomadic elements in cultured Drosophila cells. Chromosoma 97, 212-218]. This suggests that so many transpositions modified the genome organisation and consequently the expression of targeted genes. To understand what has directed the transposition of TEs in Drosophila cell culture genomes, a search to identify the newly transposed copies was undertaken using 1731, a LTR-retrotransposon. A comparison between 1731 full-length elements found in the fly sequenced genome (y(1); cn(1)bw(1), sp(1) stock) and 1731 full-length elements amplified by PCR in the two cell line was done. The resulting data provide evidence that all 1731 neocopies were derived from a single copy slightly active in the Drosophila genome and subsequently strongly activated in cultured cells; and that this active copy is related to a newly evolved genomic variant (Kalmykova, A.I., et al., 2004. Selective expansion of the newly evolved genomic variants of retrotransposon 1731 in the Drosophila genomes. Mol. Biol. Evol. 21, 2281-2289). Moreover, neocopies are shown to be inserted in different sets of genes in the two cell lines suggesting they might be involved in the biological and physiological differences observed between Kc and S2 cell lines.  相似文献   

20.
C Gao  M Xiao  X Ren  A Hayward  J Yin  L Wu  D Fu  J Li 《Genomics》2012,100(4):222-230
The movement of transposable elements (TE) in eukaryotic genomes can often result in the occurrence of nested TEs (the insertion of TEs into pre-existing TEs). We performed a general TE assessment using available databases to detect nested TEs and analyze their characteristics and putative functions in eukaryote genomes. A total of 802 TEs were found to be inserted into 690 host TEs from a total number of 11,329 TEs. We reveal that repetitive sequences are associated with an increased occurrence of nested TEs and sequence biased of TE insertion. A high proportion of the genes which were associated with nested TEs are predicted to localize to organelles and participate in nucleic acid and protein binding. Many of these function in metabolic processes, and encode important enzymes for transposition and integration. Therefore, nested TEs in eukaryotic genomes may negatively influence genome expansion, and enrich the diversity of gene expression or regulation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号