首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Combined evidence annotation of transposable elements in genome sequences   总被引:1,自引:0,他引:1  
Transposable elements (TEs) are mobile, repetitive sequences that make up significant fractions of metazoan genomes. Despite their near ubiquity and importance in genome and chromosome biology, most efforts to annotate TEs in genome sequences rely on the results of a single computational program, RepeatMasker. In contrast, recent advances in gene annotation indicate that high-quality gene models can be produced from combining multiple independent sources of computational evidence. To elevate the quality of TE annotations to a level comparable to that of gene models, we have developed a combined evidence-model TE annotation pipeline, analogous to systems used for gene annotation, by integrating results from multiple homology-based and de novo TE identification methods. As proof of principle, we have annotated "TE models" in Drosophila melanogaster Release 4 genomic sequences using the combined computational evidence derived from RepeatMasker, BLASTER, TBLASTX, all-by-all BLASTN, RECON, TE-HMM and the previous Release 3.1 annotation. Our system is designed for use with the Apollo genome annotation tool, allowing automatic results to be curated manually to produce reliable annotations. The euchromatic TE fraction of D. melanogaster is now estimated at 5.3% (cf. 3.86% in Release 3.1), and we found a substantially higher number of TEs (n = 6,013) than previously identified (n = 1,572). Most of the new TEs derive from small fragments of a few hundred nucleotides long and highly abundant families not previously annotated (e.g., INE-1). We also estimated that 518 TE copies (8.6%) are inserted into at least one other TE, forming a nest of elements. The pipeline allows rapid and thorough annotation of even the most complex TE models, including highly deleted and/or nested elements such as those often found in heterochromatic sequences. Our pipeline can be easily adapted to other genome sequences, such as those of the D. melanogaster heterochromatin or other species in the genus Drosophila.  相似文献   

2.
3.
4.
The distribution of interspersed repetitive DNA sequences in the human genome   总被引:25,自引:0,他引:25  
The distribution of interspersed repetitive DNA sequences in the human genome has been investigated, using a combination of biochemical, cytological, computational, and recombinant DNA approaches. "Low-resolution" biochemical experiments indicate that the general distribution of repetitive sequences in human DNA can be adequately described by models that assume a random spacing, with an average distance of 3 kb. A detailed "high-resolution" map of the repetitive sequence organization along 400 kb of cloned human DNA, including 150 kb of DNA fragments isolated for this study, is consistent with this general distribution pattern. However, a higher frequency of spacing distances greater than 9.5 kb was observed in this genomic DNA sample. While the overall repetitive sequence distribution is best described by models that assume a random distribution, an analysis of the distribution of Alu repetitive sequences appearing in the GenBank sequence database indicates that there are local domains with varying Alu placement densities. In situ hybridization to human metaphase chromosomes indicates that local density domains for Alu placement can be observed cytologically. Centric heterochromatin regions, in particular, are at least 50-fold underrepresented in Alu sequences. The observed distribution for repetitive sequences in human DNA is the expected result for sequences that transpose throughout the genome, with local regions of "preference" or "exclusion" for integration.  相似文献   

5.
Two types (MIR and Alu) of short interspersed repeated DNA sequences (SINEs) were used for analysis of genetic relationships among higher primates, and for detection of polymorphism in human genomic DNA. The DNA regions located between the neighboring copies of these SINEs were amplified in polymerase chain reaction with primers complementary to the MIR and Alu consensus sequences (inter-SINE PCR). Comparison of the sets of amplified DNA fragments for different species or individuals provides evaluation of the relationships among them. Using inter-MIR PCR technique, the relationships among the higher primates of the infraorder Catarrhini reported elsewhere were confirmed, pointing to the efficiency of the method for phylogenetic studies. No human DNA polymorphism was revealed with the help of inter-MIR PCR. This polymorphism was detected by means of inter-Alu PCR, which is probably associated with the continuing amplification of Alu elements in human genome.  相似文献   

6.
Two types (MIR and Alu) of short interspersed repeated DNA sequences (SINEs) were used for analysis of genetic relationships among higher primates, and for detection of polymorphism in human genomic DNA. The DNA regions located between the neighboring copies of these SINEs were amplified in polymerase chain reaction with primers complementary to the MIR and Alu consensus sequences (inter-SINE PCR). Comparison of the sets of amplified DNA fragments for different species or individuals provides evaluation of the relationships among them. Using inter-MIR PCR technique, the relationships among the higher primates of the infraorder Catarrhini reported elsewhere were confirmed, pointing to the efficiency of the method for phylogenetic studies. No human DNA polymorphism was revealed with the help of inter-MIR PCR. This polymorphism was detected by means of inter-Alu PCR, which is probably associated with the continuing amplification of Alu elements in human genome.  相似文献   

7.
Most of our understanding of plant genome structure and evolution has come from the careful annotation of small (e.g., 100 kb) sequenced genomic regions or from automated annotation of complete genome sequences. Here, we sequenced and carefully annotated a contiguous 22 Mb region of maize chromosome 4 using an improved pseudomolecule for annotation. The sequence segment was comprehensively ordered, oriented, and confirmed using the maize optical map. Nearly 84% of the sequence is composed of transposable elements (TEs) that are mostly nested within each other, of which most families are low-copy. We identified 544 gene models using multiple levels of evidence, as well as five miRNA genes. Gene fragments, many captured by TEs, are prevalent within this region. Elimination of gene redundancy from a tetraploid maize ancestor that originated a few million years ago is responsible in this region for most disruptions of synteny with sorghum and rice. Consistent with other sub-genomic analyses in maize, small RNA mapping showed that many small RNAs match TEs and that most TEs match small RNAs. These results, performed on ∼1% of the maize genome, demonstrate the feasibility of refining the B73 RefGen_v1 genome assembly by incorporating optical map, high-resolution genetic map, and comparative genomic data sets. Such improvements, along with those of gene and repeat annotation, will serve to promote future functional genomic and phylogenomic research in maize and other grasses.  相似文献   

8.
Throughout evolution, eukaryotic genomes have been invaded by transposable elements (TEs). Little is known about the factors leading to genomic proliferation of TEs, their preferred integration sites and the molecular mechanisms underlying their insertion. We analyzed hundreds of thousands nested TEs in the human genome, i.e. insertions of TEs into existing ones. We first discovered that most TEs insert within specific ‘hotspots’ along the targeted TE. In particular, retrotransposed Alu elements contain a non-canonical single nucleotide hotspot for insertion of other Alu sequences. We next devised a method for identification of integration sequence motifs of inserted TEs that are conserved within the targeted TEs. This method revealed novel sequences motifs characterizing insertions of various important TE families: Alu, hAT, ERV1 and MaLR. Finally, we performed a global assessment to determine the extent to which young TEs tend to nest within older transposed elements and identified a 4-fold higher tendency of TEs to insert into existing TEs than to insert within non-TE intergenic regions. Our analysis demonstrates that TEs are highly biased to insert within certain TEs, in specific orientations and within specific targeted TE positions. TE nesting events also reveal new characteristics of the molecular mechanisms underlying transposition.  相似文献   

9.
Buisine N  Quesneville H  Colot V 《Genomics》2008,91(5):467-475
Transposable elements (TEs) are ubiquitous components of eukaryotic genomes that impact many aspects of genome function. TE detection in genomic sequences is typically performed using similarity searches against a set of reference sequences built from previously identified TEs. Here, we demonstrate that this process can be improved by designing reference sets that incorporate key aspects of the structure and evolution of TEs and by combining these sets with Repbase Update (RU), which is composed mainly of consensus sequences. Using the Arabidopsis genome as a test case, our approach leads to the detection of an extra 12.4% of TE sequences. These correspond to novel TE fragments as well as to the extension of TE fragments already detected by RU. Significantly, we find that TE detection could be readily optimized using only two reference sets, one containing true consensus sequences and the other mosaic sequences that capture the structural diversity of TE copies within a family.  相似文献   

10.
We analysed the distribution of transposable elements (TEs) in 100 aligned pairs of orthologous intergenic regions from the mouse and human genomes. Within these regions, conserved segments of high similarity between the two species alternate with segments of low similarity. Identifiable TEs comprise 40-60% of segments of low similarity. Within such segments, a particular copy of a TE found in one species has no orthologue in the other. Overall, TEs comprise only approximately 20 % of conserved segments. However, TEs from two families, MIR and L2, are rather common within conserved segments. Statistical analysis of the distributions of TEs suggests that a majority of the MIR and L2 elements present in murine intergenic regions have human orthologues. These elements must have been present in the common ancestor of human and mouse and have remained under substantial negative selection that prevented their divergence beyond recognition. If so, recruitment of MIR- and L2-derived sequences to perform a function that increases host fitness is rather common, with at least two such events per host gene. The central part of the MIR consensus sequence is over-represented in conserved segments given its background frequency in the genome, suggesting that it is under the strongest selective constraint.  相似文献   

11.
Repetitive genomic sequences might have various structural features and properties distinct from those of the known transposable elements (TE). Here, the content and properties of the repetitive sequences present in a 200-kb region around the rice waxy locus were analyzed using the available rice genomic database. In our previous Southern blotting analysis, 70% of the segments in this region showed smeared patterns, but according to the present database analysis, the proportion of repetitive sequences in this region was only 15%. The repetitive segments in this 200-kb region comprised 75 repetitive sequences that we classified into 46 subfamilies: 21 subfamilies were known TEs or repetitive sequences and 25 subfamilies consisted of newly identified TEs or novel types of repetitive sequences. The region contains no long terminal repeat (LTR) retrotransposable elements, but miniature inverted repeat transposable elements (MITEs) constituted a major class among the elements identified. These MITEs showed remarkable structural divergence: 12 elements were found to be new members of known MITE superfamilies, while five elements had novel terminal structures, and did not belong to any known TE families. Interestingly, about 10% of the repetitive sequences, including virus-like sequences did not have any of the usual characteristics of TEs, suggesting that a certain proportion of repetitive sequences that might not share the transpositional mechanisms of known elements are dispersed in the compact rice genome.  相似文献   

12.
Organisms with a high density of transposable elements (TEs) exhibit nesting, with subsequent repeats found inside previously inserted elements. Nesting splits the sequence structure of TEs and makes annotation of repetitive areas challenging. We present TEnest, a repeat identification and display tool made specifically for highly repetitive genomes. TEnest identifies repetitive sequences and reconstructs separated sections to provide full-length repeats and, for long-terminal repeat (LTR) retrotransposons, calculates age since insertion based on LTR divergence. TEnest provides a chronological insertion display to give an accurate visual representation of TE integration history showing timeline, location, and families of each TE identified, thus creating a framework from which evolutionary comparisons can be made among various regions of the genome. A database of repeats has been developed for maize (Zea mays), rice (Oryza sativa), wheat (Triticum aestivum), and barley (Hordeum vulgare) to illustrate the potential of TEnest software. All currently finished maize bacterial artificial chromosomes totaling 29.3 Mb were analyzed with TEnest to provide a characterization of the repeat insertions. Sixty-seven percent of the maize genome was found to be made up of TEs; of these, 95% are LTR retrotransposons. The rate of solo LTR formation is shown to be dissimilar across retrotransposon families. Phylogenetic analysis of TE families reveals specific events of extreme TE proliferation, which may explain the high quantities of certain TE families found throughout the maize genome. The TEnest software package is available for use on PlantGDB under the tools section (http://www.plantgdb.org/prj/TE_nest/TE_nest.html); the source code is available from (http://wiselab.org).  相似文献   

13.
Transposable elements (TEs) are repetitive DNA sequences that are ubiquitous, extremely abundant and dynamic components of practically all genomes. Much effort has gone into annotation of TE copies in reference genomes. The sequencing cost reduction and the newly available next-generation sequencing (NGS) data from multiple strains within a species offer an unprecedented opportunity to study population genomics of TEs in a range of organisms. Here, we present a computational pipeline (T-lex) that uses NGS data to detect the presence/absence of annotated TE copies. T-lex can use data from a large number of strains and returns estimates of population frequencies of individual TE insertions in a reasonable time. We experimentally validated the accuracy of T-lex detecting presence or absence of 768 previously identified TE copies in two resequenced Drosophila melanogaster strains. Approximately 95% of the TE insertions were detected with 100% sensitivity and 97% specificity. We show that even at low levels of coverage T-lex produces accurate results for TE copies that it can identify reliably but that the rate of 'no data' calls increases as the coverage falls below 15×. T-lex is a broadly applicable and flexible tool that can be used in any genome provided the availability of the reference genome, individual TE copy annotation and NGS data.  相似文献   

14.
Prediction of signal recognition particle RNA genes   总被引:3,自引:1,他引:3  
We describe a method for prediction of genes that encode the RNA component of the signal recognition particle (SRP). A heuristic search for the strongly conserved helix 8 motif of SRP RNA is combined with covariance models that are based on previously known SRP RNA sequences. By screening available genomic sequences we have identified a large number of novel SRP RNA genes and we can account for at least one gene in every genome that has been completely sequenced. Novel bacterial RNAs include that of Thermotoga maritima, which, unlike all other non-gram-positive eubacteria, is predicted to have an Alu domain. We have also found the RNAs of Lactococcus lactis and Staphylococcus to have an unusual UGAC tetraloop in helix 8 instead of the normal GNRA sequence. An investigation of yeast RNAs reveals conserved sequence elements of the Alu domain that aid in the analysis of these RNAs. Analysis of the human genome reveals only two likely genes, both on chromosome 14. Our method for SRP RNA gene prediction is the first convenient tool for this task and should be useful in genome annotation.  相似文献   

15.
High-efficiency yeast artificial chromosome fragmentation vectors   总被引:10,自引:0,他引:10  
W J Pavan  P Hieter  D Sears  A Burkhoff  R H Reeves 《Gene》1991,106(1):125-127
Chromosome fragmentation vectors (CFVs) are used to create deletion derivatives of large fragments of human DNA cloned as yeast artificial chromosomes (YACs). CFVs target insertion of a telomere sequence into the YAC via homologous recombination with Alu repetitive elements. This event results in the loss of all YAC sequences distal to the site of integration. A new series of CFVs has been developed. These vectors target fragmentation to both Alu and LINE human repetitive DNA elements. Recovery of deletion derivatives is ten- to 20-fold more efficient with the new vectors than with those described previously.  相似文献   

16.
Given the genomic abundance and susceptibility to DNA methylation, interspersed repetitive sequences in the human genome can be exploited as valuable resources in genome-wide methylation studies. To learn about the relationships between DNA methylation and repeat sequences, we performed a global measurement of CpG dinucleotide frequencies for interspersed repetitive sequences and inferred germline methylation patterns in the human genome. Although extensive CpG depletion was observed for most repeat sequences, those in the proximity to CpG islands have been relatively removed from germline methylation being the potential source of germline activation. We also investigated the CpG depletion patterns of Alu pairs to see whether they might play an active role in germline methylation. Two kinds of Alu pairs, direct or inverted pairs classified according to the orientation, showed contrast CpG depletion patterns with respect to separating distance of Alus, i.e., as two Alu elements are more closely spaced in a pair, a higher extent of CpG depletion was observed in inverted orientation and vice versa for directly repetitive Alu pairs. This suggests that specific organization of repetitive sequences, such as inverted Alu pairs, might play a role in triggering DNA methylation consistent with a homology-dependent methylation hypothesis.  相似文献   

17.
Alu repeats in the human genome   总被引:3,自引:0,他引:3  
Highly repetitive DNA sequences account for more than 50% of the human genome. The L1 and Alu families harbor the most common mammalian long (LINEs) and short (SINEs) interspersed elements. Alu elements are each a dimer of similar, but not identical, fragments of total size about 300 bp, and originate from the 7SL RNA gene. Each element contains a bipartite promoter for RNA polymerase III, a poly(A) tract located between the monomers, a 3'-terminal poly(A) tract, and numerous CpG islands, and is flanked by short direct repeats. Alu repeats comprise more than 10% of the human genome and are capable of retroposition. Possibly, these elements played an important part in genome evolution. Insertion of an Alu element into a functionally important genome region or other Alu-dependent alterations of gene functions cause various hereditary disorders and are probably associated with carcinogenesis. In total, 14 Alu families differing in diagnostic mutations are known. Some of these, which are present in the human genome, are polymorphic and relatively recently inserted into new loci. Alu copies transposed during ethnic divergence of the human population are useful markers for evolutionary genetic studies.  相似文献   

18.
A recombinant library of human DNA sequences was screened with a segment of simian virus 40 (SV40) DNA that spans the viral origin of replication. One hundred and fifty phage were isolated that hybridized to this probe. Restriction enzyme and hybridization analyses indicated that these sequences were partially homologous to one another. Direct DNA sequencing of two such SV40-hybridizing segments indicated that this was not a highly conserved family of sequences, but rather a set of DNA fragments that contained repetitive regions of high guanine plus cytosine content. These sequences were not members of the previously described Alu family of repeats and hybridized to SV40 DNA more strongly than do Alu family members. Computer analyses showed that the human DNA segments contained multiple homologies with sequences throughout the SV40 origin region, although sequences on the late side of the viral origin contained the strongest cross-hybridizing sequences. Because of the number and complexity of the matches detected, we could not determine unambiguously which of the many possible heteroduplexes between these DNAs was thermodynamically most favored. No hybridization of these human DNA sequences to any other segment of the SV40 genome was detected. In contrast, the human DNA segments isolated cross-hybridized with many sequences within the human genome. We tested for the presence of several functional domains on two of these human DNA fragments. One SV40-hybridizing fragment, SVCR29, contained a sequence which enhanced the efficiency of thymidine kinase transformation in human cells by approximately 20-fold. This effect was seen in an orientation-independent manner when the sequence was present at the 3' end of the chicken thymidine kinase gene. We propose that this segment of DNA contains a sequence analogous to the 72-base-pair repeats of SV40. The existence of such an "activator" element in cellular DNA raises the possibility that families of these sequences may exist in the mammalian genome.  相似文献   

19.
20.
Transposable elements (TEs) have been identified in every organism in which they have been looked for. The sequencing of large genomes, such as the human genome and those of Drosophila, Arabidopsis, Caenorhabditis, has also shown that they are a major constituent of these genomes, accounting for 15% of the genome of Drosophila, 45% of the human genome, and more than 70% in some plants and amphibians. Compared with the 1% of genomic DNA dedicated to protein-coding sequences in the human genome, this has prompted various researchers to suggest that the TEs and the other repetitive sequences that constitute the so-called "noncoding DNA", are where the most stimulating discoveries will be made in the future (Bromham, 2002). We are therefore getting further and further from the original idea that this DNA was simply "junk DNA", that owed its presence in the genome entirely to its capacity for selfish transposition. Our understanding of the structures of TEs, their distribution along the genomes, their sequence and insertion polymorphisms within genomes, and within and between populations and species, their impact on genes and on the regulatory mechanisms of genetic expression, their effects on exon shuffling and other phenomena that reshape the genome, and their impact on genome size has increased dramatically in recent years. This leads to a more general picture of the impact of TEs on genomes, though many copies are still mainly selfish or junk DNA. In this review we focus mainly on discoveries made in Drosophila, but we also use information about other genomes when this helps to elucidate the general processes involved in the organization, plasticity, and evolution of genomes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号