首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 265 毫秒
1.
Repetitive sequences are a major constituent of many eukaryote genomes and play roles in gene regulation, chromosome inheritance, nuclear architecture, and genome stability. The identification of repetitive elements has traditionally relied on in-depth, manual curation and computational determination of close relatives based on DNA identity. However, the rapid divergence of repetitive sequence has made identification of repeats by DNA identity difficult even in closely related species. Hence, the presence of unidentified repeats in genome sequences affects the quality of gene annotations and annotation-dependent analyses (e.g. microarray analyses). We have developed an enhanced repeat identification pipeline using two approaches. First, the de novo repeat finding program PILER-DF was used to identify interspersed repetitive elements in several recently finished Dipteran genomes. Repeats were classified, when possible, according to their similarity to known elements described in Repbase and GenBank, and also screened against annotated genes as one means of eliminating false positives. Second, we used a new program called RepeatRunner, which integrates results from both RepeatMasker nucleotide searches and protein searches using BLASTX. Using RepeatRunner with PILER-DF predictions, we masked repeats in thirteen Dipteran genomes and conclude that combining PILER-DF and RepeatRunner greatly enhances repeat identification in both well-characterized and un-annotated genomes.  相似文献   

2.
3.
Streptococcus pneumoniae open reading frame SP0082 encodes a surface protein that contains four copies of a novel conserved repeat domain that bears no significant sequence similarity to proteins of known function. Homologous sequences from other streptococci contain two to six of these repeats, designated the SSURE (streptococcal surface repeat) domain. To investigate the functional role(s) of this domain, the third SSURE repeat of SP0082 sequence has been expressed in Escherichia coli, purified to homogeneity and characterized by biochemical and immunological methods. The expressed protein fragment was found to bind to fibronectin, but not to collagen or submaxillary mucin. Anti-SSURE antibodies recognized the corresponding protein on the surface of pneumococcal cells. These data identify S. pneumoniae SP0082 protein and its homologs in other streptococci as fibronectin-binding surface adhesins. The SSURE domain is likely to contain a novel protein fold, which was tentatively modeled using ab initio modeling methods.  相似文献   

4.
An analysis of a 29-kilobase nontranscribed spacer fragment in the ribosomal DNA (rDNA) of the house cricket, Acheta domesticus, revealed a highly repetitious structure. A total of eight EcoRI repeats of three different size classes measuring 259, 420, and 508 base pairs (bp) was mapped to a region 2 kilobases (kb) from the 18 S coding region. The repeats were oriented in a nonrandom manner and had sequences homologous to DNA located immediately adjacent to the repetitive array. DNA sequence analysis showed that the repetitive region was composed of smaller direct repeats 66, 67, and 383 bp in length. There was minor length heterogeneity of the chromosomal restriction fragments containing the entire array, indicating that a variable number of EcoRI repeats is a minor contributor to the total repeat-unit length heterogeneity. Immediately upstream from the EcoRI array there is a 17-kb region composed of 50 to 60 subrepeat elements recognized by a variety of restriction endonucleases. A subcloned SmaI repeat from the array was not homologous to any other part of the rDNA repeat unit or other chromosomal DNA. There was little length heterogeneity in restriction fragments containing the chromosomal 17-kb repetitions region. Immediately upstream from the 17-Kb region there is a 4.1-kb segment with sequences homologous to the EcoRI repeats.  相似文献   

5.
Fourteen recombinant clones from Zea mays were studied with regard to their composition of unique and repetitive sequences. Southern hybridization experiments were used to classify restriction fragments of the clones into a unique, middle or highly repetitive class of reiteration frequency. All three classes were often found on the same genomic clone. Crosshybridization studies between clones showed that a given repeat might be present on several clones, and thus four families of highly repetitive elements were established. Heteroduplex analysis was used to show the arrangement and size of repeats common between several clones. A short interspersion pattern of unique, middle and highly repetitive DNA was found. The dispersed repetitive elements were 300-1300 bp in length. Analysis of the pattern produced by a given repeat in genomic Southern experiments suggests that some small dispersed repeats may also exist as part of a larger repeating unit elsewhere in the genome.  相似文献   

6.
The current pace of the generation of sequence data requires the development of software tools that can rapidly provide full annotation of the data. We have developed a new method for rapid sequence comparison using the exact match algorithm without repeat masking. As a demonstration, we have identified all perfect simple tandem repeats (STR) within the draft sequence of the human genome. The STR elements (chromosome, position, length and repeat subunit) have been placed into a relational database. Repeat flanking sequence is also publicly accessible at http://grid.abcc.ncifcrf.gov. To illustrate the utility of this complete set of STR elements, we documented the increased density of potentially polymorphic markers throughout the genome. The new STR markers may be useful in disease association studies because so many STR elements manifest multiallelic polymorphism. Also, because triplet repeat expansions are important for human disease etiology, we identified trinucleotide repeats that exist within exons of known genes. This resulted in a list that includes all 14 genes known to undergo polynucleotide expansion, and 48 additional candidates. Several of these are non-polyglutamine triplet repeats. Other examinations of the STR database demonstrated repeats spanning splice junctions and identified SNPs within repeat elements.  相似文献   

7.
It has become clear that dispersed repeat sequences have played multiple roles in eukaryotic genome evolution including increasing genetic diversity through mutation, inducing changes in gene expression, and facilitating generation of novel genes. Growing recognition of the importance of dispersed repeats has fueled development of computational tools designed to expedite discovery and classification of repeats. Here we review major existing repeat exploration tools and discuss the algorithms utilized by these tools. Special attention is devoted to ab initio programs, i.e., those tools that do not rely upon previously identified repeats to find new repeat elements. We conclude by discussing the strengths and weaknesses of current tools and highlighting additional approaches that may advance repeat discovery/characterization.  相似文献   

8.
A middle repetitive sequence NPR18 was isolated from Nicotiana plumbaginifolia nuclear genome [8]. Sequences homologous to the repeat are dispersed through genomes of several Nicotiana species. compute-assisted data analysis of NPR18 primary sequence reveals several features attributed to mobile genetic elements: an AT content higher than average for nuclear DNA of genus Nicotiana plants; a number of direct and inverted repeats. Some of the repeats displayed homology to the terminal and subterminal repeats of Ac/Ds-like plant elements.  相似文献   

9.
The repetitive sequence PisTR-A has an unusual organization in the pea (Pisum sativum) genome, being present both as short dispersed repeats as well as long arrays of tandemly arranged satellite DNA. Cloning, sequencing and FISH analysis of both PisTR-A variants revealed that the former occurs in the genome embedded within the sequence of Ty3/gypsy-like Ogre elements, whereas the latter forms homogenized arrays of satellite repeats at several genomic loci. The Ogre elements carry the PisTR-A sequences in their 3′ untranslated region (UTR) separating the gag-pol region from the 3′ LTR. This region was found to be highly variable among pea Ogre elements, and includes a number of other tandem repeats along with or instead of PisTR-A. Bioinformatic analysis of LTR-retrotransposons mined from available plant genomic sequence data revealed that the frequent occurrence of variable tandem repeats within 3′ UTRs is a typical feature of the Tat lineage of plant retrotransposons. Comparison of these repeats to known plant satellite sequences uncovered two other instances of satellites with sequence similarity to a Tat-like retrotransposon 3′ UTR regions. These observations suggest that some retrotransposons may significantly contribute to satellite DNA evolution by generating a library of short repeat arrays that can subsequently be dispersed through the genome and eventually further amplified and homogenized into novel satellite repeats.  相似文献   

10.
Three repetitive sequence families from the sea urchin genome were studied, each defined by homology with a specific cloned probe one to a few hundred nucleotides long. Recombinant λ-sea urchin DNA libraries were screened with these probes, and individual recombinants were selected that include genomic members of these families. Restriction mapping, gel blot, and kinetic analyses were carried out to determine the organization of each repeat family. Sequence elements belonging to the first of the three repeat families were found to be embedded in longer repeat sequences. These repeat sequences frequently occur in small clusters. Members of the second repeat family are also found in a long repetitive sequence environment, but these repeats usually occur singly in any given region of the DNA. The sequences of the third repeat are only 200 to 300 nucleotides long, and are generally terminated by single copy DNA, though a few examples were found associated with other repeats. These three repeat sequence families constitute sets of homologous sequence elements that relate distant regions of the DNA.  相似文献   

11.
We isolated DNA fragments containing various repetitive elements from the genome of a sea bream Acanthopagrus latus. Sequence analysis indicated that two fragments have particularly interesting features. Fragment AL87 contained a tetranucleotide repeat and a quasipalindromic sequence. Sequence comparison suggested that AL87 may be a part of a gene encoding a serine/threonine protein kinase, and that the quasipalindrome is situated at the junction of an intron and an exon. Moreover, the quasipalindrome is conserved in several other fishes, even though it has the potential to form a stem-loop structure at the splicing site. Fragment AL79 contained a minisatellite sequence made up of six 30-bp units in tandem. DNase I sensitivity assays and statistical analyses showed the repeat region to be flexible when subjected to bending stress. In addition, atomic force microscopic imaging of AL79 showed the presence of highly curved (kinked) segments flanking the repeat region. The structural features of these repetitive elements may be key factors facilitating the amplification of the repeats.  相似文献   

12.
13.
14.
Repeat elements are important components of eukaryotic genomes. One limitation in our understanding of repeat elements is that most analyses rely on reference genomes that are incomplete and often contain missing data in highly repetitive regions that are difficult to assemble. To overcome this problem we develop a new method, REPdenovo, which assembles repeat sequences directly from raw shotgun sequencing data. REPdenovo can construct various types of repeats that are highly repetitive and have low sequence divergence within copies. We show that REPdenovo is substantially better than existing methods both in terms of the number and the completeness of the repeat sequences that it recovers. The key advantage of REPdenovo is that it can reconstruct long repeats from sequence reads. We apply the method to human data and discover a number of potentially new repeats sequences that have been missed by previous repeat annotations. Many of these sequences are incorporated into various parasite genomes, possibly because the filtering process for host DNA involved in the sequencing of the parasite genomes failed to exclude the host derived repeat sequences. REPdenovo is a new powerful computational tool for annotating genomes and for addressing questions regarding the evolution of repeat families. The software tool, REPdenovo, is available for download at https://github.com/Reedwarbler/REPdenovo.  相似文献   

15.
16.
We describe the structure of an Arabidopsis thaliana genomic clone containing two classes of repetitive DNA elements derived from the centromere region of chromosome 1. One class is comprised of tandem arrays of a highly reiterated repeat containing degenerate telomere sequence motifs. Adjacent to these telomere-similar repeats we found a dispersed repetitive element reiterated approximately five times in the A. thaliana genome. The nucleotide sequence of the dispersed repeat is unusual, being extremely AT-rich and composed of numerous, overlapping repeat motifs.  相似文献   

17.
The organization of the mitochondrial DNA (mtDNA) control region (CR) of the pollen beetle Meligethes thalassophilus is described. This mtDNA CR represents the longest sequenced for beetles so far, since the entire nucleotide sequence ranges from approximately 5000 to approximately 5500 bp. The CR of M. thalassophilus is organized in three distinct domains: a conserved domain near the tRNAIle gene, a variable domain flanking the 12S rRNA gene, and a relatively large central tandem array made up of a variable number of approximately 170 bp repeats that is responsible for the intraspecific length variation observed. Like other CRs found in insects, the M. thalassophilus CR contains two long homopolymeric runs that may be involved in mtDNA replication. Furthermore, conserved stem-and-loop structures in the repetitive domain were identified and their possible role in generating length variation is examined. Intraspecific comparison of the tandem repeat elements of M. thalassophilus suggests mechanisms of concerted evolution leading to homogenization of the repetitive region. The utility of such an array of tandem repeats as a genetic marker for assessing population-level variability and evolutionary relationships among populations is discussed. Finally, the technical difficulties found in isolating the mtDNA CR in beetles are remarked upon.  相似文献   

18.
The genome of the parasitic platyhelminth Schistosoma mansoni is composed of approximately 40% of repetitive sequences of which roughly 20% correspond to transposable elements. When the genome sequence became available, conventional repeat prediction programs were used to find these repeats, but only a fraction could be identified. To exhaustively characterize the repeats we applied a new massive sequencing based strategy: we re-sequenced the genome by next generation sequencing, aligned the sequencing reads to the genome and assembled all multiple-hit reads into contigs corresponding to the repetitive part of the genome. We present here, for the first time, this de novo repeat assembly strategy and we confirm that such assembly is feasible. We identified and annotated 4,143 new repeats in the S. mansoni genome. At least one third of the repeats are transcribed. This strategy allowed us also to identify 14 new microsatellite markers, which can be used for pedigree studies. Annotations and the combined (previously known and new) 5,420 repeat sequences (corresponding to 47% of the genome) are available for download (http://methdb.univ-perp.fr/downloads/).  相似文献   

19.
GEM is a new family of repetitive sequences detected in the D. subobscura genome. Two of the four described GEM elements encompass a heterogeneous central module, with no detectable ORF, flanked by two long inverted repeats. These elements are composed of a set of repetitive modules, which are inverted repeat (IR), direct repeat (DR), palindromic sequence (PS), long sequence (LS) and short sequence (SS). These five modules can be found either clustered or dispersed as single modules in the D. subobscura genome, in euchromatic and heterochromatic regions. In addition to the 3' region of Adh retrosequences, single IR and LS blocks were found associated with the promoter region of different genes, in particular, LS-like blocks have also been found associated with functional genes in D. melanogaster and D. virilis. Conversely, the DR block is highly similar to satellite DNAs from some other species of the obscura group. In addition, GEM elements share some structural features with IS elements described in different Drosophila species. It is likely that both GEM and IS sequences would be vestiges of an ancestral transposable element.  相似文献   

20.
Streptococcus agalactiae is a leading cause of bacterial sepsis and meningitis in neonates. FbsA, a fibrinogen receptor of S. agalactiae is highly repetitive protein with each repeat containing 16 amino acids. The protein sequence of FbsA shows no homology to any known fibrinogen binding protein from other bacterial species, making it a unique fibrinogen receptor. FbsA is cloned, expressed in E. coli and purified. The recombinant protein shows a laddering pattern in SDS–PAGE gel because of its poor stability in solution. The instability of the protein is probably because of the presence Gln-Gly dipeptide in each repeat. The circular dichroism study of FbsA has shown that the protein is composed of alpha helices predominantly and random coils to a lesser extent, which agrees with the predicted secondary structure. Ab initio modeling of a single repeat shows that FbsA is made up of mainly alpha helix and the structural model of multiple repeats (3 or 4) suggests that the protein might adopt some form of a repeating helical structure and the overall conformation of the molecule might change depending on the number of repeats.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号