首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
MOTIVATION: Tandemly organized repetitive sequences (satellite DNA) are widespread in complex eukaryotic genomes. In plants, satellite repeats often represent a substantial part of nuclear DNA but only a little is known about the molecular mechanisms of their amplification and their possible role(s) in genome evolution and function. Unfortunately, addressing these questions via characterization of general sequence properties of known satellite repeats has been hindered by a difficulty in obtaining a complete and unbiased set of sequence data for this analysis. This is mainly due to the presence of multiple entries of homologous sequences and of single entries that contain more than one repeated unit (monomer) in the public databases. RESULTS: We have established a computer database specialized for plant satellite repeats (PlantSat) that integrates sequence data available from various resources with supplementary information including repeat consensus sequences, abundances, and chromosomal localizations. The sequences are stored as individual repeat monomers grouped into families, which simplifies their computer analysis and makes it more accurate. Using this feature, we have performed a basic sequence analysis of the whole set of plant satellite repeats with respect to their monomer length and nucleotide composition. The analysis revealed several preferred length ranges of the monomers (approximately 165 bp and its multiples) and an over-representation of the AA/TT dinucleotide in the repeats. We have also detected an enrichment of satellite DNA sequences for the motif CAAAA that is supposed to be involved in breakage-reunion of repeated sequences.  相似文献   

3.
The current pace of the generation of sequence data requires the development of software tools that can rapidly provide full annotation of the data. We have developed a new method for rapid sequence comparison using the exact match algorithm without repeat masking. As a demonstration, we have identified all perfect simple tandem repeats (STR) within the draft sequence of the human genome. The STR elements (chromosome, position, length and repeat subunit) have been placed into a relational database. Repeat flanking sequence is also publicly accessible at http://grid.abcc.ncifcrf.gov. To illustrate the utility of this complete set of STR elements, we documented the increased density of potentially polymorphic markers throughout the genome. The new STR markers may be useful in disease association studies because so many STR elements manifest multiallelic polymorphism. Also, because triplet repeat expansions are important for human disease etiology, we identified trinucleotide repeats that exist within exons of known genes. This resulted in a list that includes all 14 genes known to undergo polynucleotide expansion, and 48 additional candidates. Several of these are non-polyglutamine triplet repeats. Other examinations of the STR database demonstrated repeats spanning splice junctions and identified SNPs within repeat elements.  相似文献   

4.
MOTIVATION: DNA structure plays an important role in a variety of biological processes. Different di- and tri-nucleotide scales have been proposed to capture various aspects of DNA structure including base stacking energy, propeller twist angle, protein deformability, bendability, and position preference. Yet, a general framework for the computational analysis and prediction of DNA structure is still lacking. Such a framework should in particular address the following issues: (1) construction of sequences with extremal properties; (2) quantitative evaluation of sequences with respect to a given genomic background; (3) automatic extraction of extremal sequences and profiles from genomic databases; (4) distribution and asymptotic behavior as the length N of the sequences increases; and (5) complete analysis of correlations between scales. RESULTS: We develop a general framework for sequence analysis based on additive scales, structural or other, that addresses all these issues. We show how to construct extremal sequences and calibrate scores for automatic genomic and database extraction. We show that distributions rapidly converge to normality as Nincreases. Pairwise correlations between scales depend both on background distribution and sequence length and rapidly converge to an analytically predictable asymptotic value. For di- and tri-nucleotide scales, normal behavior and asymptotic correlation values are attained over a characteristic window length of about 10-15 bp. With a uniform background distribution, pairwise correlations between empirically-derived scales remain relatively small and roughly constant at all lengths, except for propeller twist and protein deformability which are positively correlated. There is a positive (resp. negative) correlation between dinucleotide base stacking (resp. propeller twist and protein deformability) and AT-content that increases in magnitude with length. The framework is applied to the analysis of various DNA tandem repeats. We derive exact expressions for counting the number of repeat unit classes at all lengths. Tandem repeats are likely to result from a variety of different mechanisms, a fraction of which is likely to depend on profiles characterized by extreme structural features.  相似文献   

5.
During recloning of Nicotiana tabacum L. repetitive sequence R8.3 in Escherichia coli, a modified clone that differed from the original by the insertion of an IS10 sequence was unintentionally produced. The insert was flanked by a 9-bp direct repeat derived from the R8.3 sequence, the 9-bp duplication of acceptor DNA in the site of insertion being a characteristic of IS10 transposition events. A database search using the FASTA program showed IS10 and other prokaryotic IS elements inserted into numerous eukaryotic clones. Unexpectedly, the IS10, which is not a natural component of the E. coli genome, appeared to be by far the most frequent contaminant of DNA databases among several IS sequences tested. In the GenEMBL database, the IS10 query sequence yielded positive scores with more than 500 eukaryotic clones. Insertions of shortened IS10 sequences having only one intact terminal inverted repeat were commonly found. Most full-length IS10 insertions (32 out of 40 analyzed) were flanked by 9-bp direct repeats having the consensus 5'-NPuCNN-NGPyN-3' with a strong preference for 5'-TGCTNA-GNN-3'. One insertion was flanked by an inverted repeat of more than 400 bp in length. PCR amplification and Southern analysis revealed the presence of IS10 sequences in E. coli strains commonly used for DNA cloning, including some reported to be Tn10-free. No IS10-specific PCR product was obtained with N. tabacum or human DNA. Our data suggest that transposition of IS10 elements may accompany cloning steps, particularly into large BAC vectors. This might lead to the relatively frequent contamination of DNA databases by this bacterial sequence. It is estimated that one in approximately every thousand eukaryotic clone in the databases is contaminated by IS-derived sequences. We recommend checking submitted sequences for the presence of IS10 and other IS elements. In addition, DNA databases should be corrected by removing contaminating IS sequences.  相似文献   

6.
Comparison of ARM and HEAT protein repeats   总被引:18,自引:0,他引:18  
ARM and HEAT motifs are tandemly repeated sequences of approximately 50 amino acid residues that occur in a wide variety of eukaryotic proteins. An exhaustive search of sequence databases detected new family members and revealed that at least 1 in 500 eukaryotic protein sequences contain such repeats. It also rendered the similarity between ARM and HEAT repeats, believed to be evolutionarily related, readily apparent. All the proteins identified in the database searches could be clustered by sequence similarity into four groups: canonical ARM-repeat proteins and three groups of the more divergent HEAT-repeat proteins. This allowed us to build improved sequence profiles for the automatic detection of repeat motifs. Inspection of these profiles indicated that the individual repeat motifs of all four classes share a common set of seven highly conserved hydrophobic residues, which in proteins of known three-dimensional structure are buried within or between repeats. However, the motifs differ at several specific residue positions, suggesting important structural or functional differences among the classes. Our results illustrate that ARM and HEAT-repeat proteins, while having a common phylogenetic origin, have since diverged significantly. We discuss evolutionary scenarios that could account for the great diversity of repeats observed.  相似文献   

7.
G C Overton  E S Weinberg 《Cell》1978,14(2):247-257
Histone gene repeats in S. purpuratus are shown to be of variable length and sequence. Two recombinant plasmids containing the full-length 6.3 kb histone repeat unit are found to differer in length at two sites in the repeating structure and in the occurrence of two restriction enzyme recognition sites. Variation in repeat length is also demonstrated in the unfractionated DNA of five sea urchins and in a sample of DNA enriched for histone gene sequences by density gradient methods. The repeats in each individual are of a very limited number of major classes, which may differ from one another in overall length or in distribution and presence of particular restriction enzyme sites. Variations are found to occur at many regions of the repeat; some have been mapped specifically to spacer regions. Repeats may differ dramatically from individual to individual since there is no one type of repeat class common to all, although the absolute length differences of the repeats that are found are small.  相似文献   

8.
We have identified four novel repeats and two domains in cell surface proteins encoded by the Methanosarcina acetivorans genome and in some archaeal and bacterial genomes. The repeats correspond to a certain number of amino acid residues present in tandem in a protein sequence and each repeat is characterized by conserved sequence motifs. These correspond to: (a) a 42 amino acid (aa) residue RIVW repeat; (b) a 45 aa residue LGxL repeat; (c) a 42 aa residue LVIVD repeat; and (d) a 54 aa residue LGFP repeat. The domains correspond to a certain number of aa residues in a protein sequence that do not comprise internal repeats. These correspond to: (a) a 200 aa residue DNRLRE domain; and (b) a 70 aa residue PEGA domain. We discuss the occurrence of these repeats and domains in the different proteins and genomes analysed in this work.  相似文献   

9.
The biologically active state of many proteins requires their prior homo-oligomerisation. Such complexes are typically symmetrical, a feature that has been proposed to increase their stability and facilitate the evolution of allosteric regulation. We wished to examine the possibility that similar structures and properties could arise from genetic amplifications leading to internal symmetrical repeats. For this, we identified internal structural repeats in a nonredundant Protein Data Bank subset. While testing if repeats in proteins tend to be symmetrical, we found that about half of the large internal repeats are symmetrical, most frequently around a rotation axis of 180°. These repeats were most likely created by genetic amplification processes because they show significant sequence similarity. Symmetrical repeats tend to have a fixed number of copies corresponding to their rotational symmetry order, that is, two for 180° rotation axis, whereas asymmetrical repeats are in longer proteins and show copy number variability. When possible, we confirmed that proteins with symmetrical repeats folding as an n-mer have homologues lacking the repeat with a higher oligomerisation number corresponding to the rotation symmetry order of the repeat. Phylogenetic analyses of these protein families suggest that typically, but not always, symmetrical repeats arise in one single event from proteins that are homo-oligomers. These results suggest that oligomerisation and amplification of internal sequences can interplay in evolutionary terms because they result in functional analogues when the latter exhibit rotational symmetry.  相似文献   

10.
Microsatellites are widely distributed in plant genomes and comprise unstable regions that undergo mutational changes at rates much greater than that observed for non-repetitive sequences. They demonstrate intrinsic genetic instability, manifested as frequent length changes due to insertions or deletions of repeat units. Detailed analysis of 1600 clones containing genomic sequences of Vicia bithynica revealed the presence of microsatellite repeats in its genome. Based on the screening of a partial DNA library of plasmids, 13 clones harbouring (GA/TC)n tracts of various lengths of repeated motif were identified for further analysis of their internal sequence organization. Sequence analyses revealed the precise length, number of repeats, interruptions within tracts, as well as sequence composition flanking the repeat motifs. Representative plasmids containing different lengths of (GA/TC)n embedded in their original flanking sequence were used to investigate the genetic stability of the repeats. In the study presented herein, we employed a well characterised and tractable bacterial genetic system. Recultivations of Escherichia coli harbouring plasmids containing (GA/TC)n inserts demonstrated that the genetic instability of (GA/TC)n microsatellites depends highly on their length (number of repeats). These observations are in agreement with similar studies performed on repetitive sequences from humans and other organisms.  相似文献   

11.
A computer-aided homology search of databases found that the nucleotide sequences flanking ATLN44, a non-LTR retrotransposon (LINE) from Arabidopsis thaliana, are repeated in the A. thaliana genome. These sequences are homologous to flanking sequences of 664 bp with terminal inverted repeat sequences of about 70 bp. The 664-bp sequence and most of the 14 homologues identified were flanked by direct repeat sequences of 9 bp. These findings indicate that the repeated sequence, named Tnat1, is a transposable element that duplicates a 9-bp sequence at the target site on transposition and that ATLN44 is inserted in one Tnat1 member. Interestingly, all of the Tnat1 members had tandem repeats comprised of several units of a 60-bp sequence, the number of repeats differing among Tnat1 members. Of the Tnat1 members identified, one was inserted into another sequence repeated in the A. thaliana genome: that sequence is about 770 bp long and has terminal inverted repeat sequences of about 110 bp. The sequence is flanked by direct repeats of a 9-bp sequence, indicating that it is another transposable element, named Tnat2, from A. thaliana. Moreover, Tnat2 members had a tandem repeat about 240 bp long. Tnat1 and Tnat2 with tandem repeats in their internal regions show no homology to each other or to any of the elements identified previously; therefore they appear to be novel transposable elements.  相似文献   

12.
Many structural, signaling, and adhesion molecules contain tandemly repeated amino acid motifs. The alpha-actinin/spectrin/dystrophin superfamily of F-actin-crosslinking proteins contains an array of triple alpha-helical motifs (spectrin repeats). We present here the complete sequence of the novel beta-spectrin isoform beta(Heavy)- spectrin (beta H). The sequence of beta H supports the origin of alpha- and beta-spectrins from a common ancestor, and we present a novel model for the origin of the spectrins from a homodimeric actin-crosslinking precursor. The pattern of similarity between the spectrin repeat units indicates that they have evolved by a series of nested, nonuniform duplications. Furthermore, the spectrins and dystrophins clearly have common ancestry, yet the repeat unit is of a different length in each family. Together, these observations suggest a dynamic period of increase in repeat number accompanied by homogenization within each array by concerted evolution. However, today, there is greater similarity of homologous repeats between species than there is across repeats within species, suggesting that concerted evolution ceased some time before the arthropod/vertebrate split. We propose a two-phase model for the evolution of the spectrin repeat arrays in which an initial phase of concerted evolution is subsequently retarded as each new protein becomes constrained to a specific length and the repeats diverge at the DNA level. This evolutionary model has general applicability to the origins of the many other proteins that have tandemly repeated motifs.   相似文献   

13.
Full-consensus designed ankyrin repeat proteins were designed with one to six identical repeats flanked by capping repeats. These proteins express well in Escherichia coli as soluble monomers. Compared to our previously described designed ankyrin repeat protein library, randomized positions have now been fixed according to sequence statistics and structural considerations. Their stability increases with length and is even higher than that of library members, and those with more than three internal repeats are resistant to denaturation by boiling or guanidine hydrochloride. Full denaturation requires their heating in 5 M guanidine hydrochloride. The folding and unfolding kinetics of the proteins with up to three internal repeats were analyzed, as the other proteins could not be denatured. Folding is monophasic, with a rate that is nearly identical for all proteins (∼ 400-800 s− 1), indicating that essentially the same transition state must be crossed, possibly the folding of a single repeat. In contrast, the unfolding rate decreases by a factor of about 104 with increasing repeat number, directly reflecting thermodynamic stability in these extraordinarily slow denaturation rates. The number of unfolding phases also increases with repeat number. We analyzed the folding thermodynamics and kinetics both by classical two-state and three-state cooperative models and by an Ising-like model, where repeats are considered as two-state folding units that can be stabilized by interacting with their folded nearest neighbors. This Ising model globally describes both equilibrium and kinetic data very well and allows for a detailed explanation of the ankyrin repeat protein folding mechanism.  相似文献   

14.
The expansion of a CAG trinucleotide repeat (TNR) sequence has been linked to several neurological disorders, for example, Huntington's disease (HD). In HD, healthy individuals have 5-35 CAG repeats. Those with 36-39 repeats have the premutation allele, which is known to be prone to expansion. In the disease state, greater than 40 repeats are present. Interestingly, the formation of non-B DNA conformations by the TNR sequence is proposed to contribute to the expansion. Here we provide the first structural and thermodynamic analysis of a premutation length TNR sequence. Using chemical probes of nucleobase accessibility, we found that similar to (CAG)(10), the premutation length sequence (CAG)(36) forms a stem-loop hairpin and contains a hot spot for DNA damage. Additionally, calorimetric analysis of a series of (CAG)(n) sequences, that includes repeat tracts in both the healthy and premutation ranges, reveal that thermodynamic stability increases linearly with the number of repeats. Based on these data, we propose that while non-B conformations can be formed by TNR tracts found in both the healthy and premutation allele, only sequences containing at least 36 repeats have sufficient thermodynamic stability to contribute to expansion.  相似文献   

15.
The rapid divergence of repetitive sequences makes them desirable markers for phylogenetic studies of closely related groups, provided that a high level of sequence homogeneity has been maintained within species. Intraspecific polymorphisms are found in an increasing number of studies now, and this highlights the need to determine why these occur. In this study we examined intraindividual variation present in the first ribosomal internal transcribed spacer (ITS1) from a group of cryptic mosquito species. Individuals of the Anopheles punctulatus group contained multiple ITS1 length variants that ranged from 1.2 to 8.0 kb. Nucleotide and copy number variation for several homologous internal repeats is common, yet the intraspecific sequence divergence of cloned PCR isolates is comparable to that of other mosquito species (~0.2–1.5%). Most of the length variation is comprised of a 5′-ITS1 repeat that was identified as a duplication of a conserved ITS2 region. Secondary structure conservation for this repeat is pronounced and several repeat types that are highly homogenized have formed. Significant interspecific divergence indicates a high rate of evolutionary change for this spacer. A maximum likelihood tree constructed here was congruent with previous phylogenetic hypotheses and suggests that concerted evolution is also accompanied by interpopulation divergence. The lack of interindividual differences and the presence of homogenized internal repeats suggest that a high rate of turnover has reduced the overall level of variation. However, the intraindividual variation also appears to be maintained by the absence of a single turnover rate and the complex dynamics of ongoing recombination within the spacer. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

16.
Microsatellites or simple sequence repeats (SSRs) occur ubiquitously and show complex patterns in length, motif size and sequence. Among SSRs, dinucleotide repeats occur in high abundance in fungi with shorter length as compared to other organisms. In this study, multilocus profiles obtained in Magnaporthe grisea, a model plant pathogen were evaluated. The results showed lower rate of polymorphism by (GT)(n)/(TG)(n) repeat-based primers and suggested occurrence of (GA)(n)/(AG)(n) repeats as integral repeats and (TC)(n)/(CT)(n) and (AC)(n)/(CA)(n) as non-integral repeats. Low repeat length variation was found to be correlated with less number of repeat motifs. The study provides an insight into the possibility of molecular coevolution of mobile elements and dinucleotide repeats in fungi. The study could be applied to other species for wider applications including evolutionary and population genetics.  相似文献   

17.
Nucleotide sequence analysis revealed that a DNA length polymorphism 5' to the human antithrombin III gene is due to the presence of 32bp or 108bp nonhomologous nucleotide sequences (variable segments) 345bp upstream from the translation initiation codon. Sequences at the 3' borders of both variable segments can form intrastrand inverted repeat structures with sequences further downstream. An inverted repeat is also found immediately 5' to the site where the variable segments are located. Thus, cruciform structures may form flanking the variable segments of both alleles of this DNA length polymorphism. DNA secondary structure may be detected with single strand specific nucleases. S1 nuclease sensitive sites were mapped in recombinant plasmids containing the cloned alleles of the ATIII length polymorphism. The site most sensitive to S1 is located upstream from the variable segments in an AT-rich segment flanked by 6bp direct repeats. A region of lesser nuclease sensitivity was also observed in the AT-rich loops formed between the inverted repeats 5' to the variable segments.  相似文献   

18.
Segments of the murine genome that hybridize to the inverted repeat regions of the transposable TU elements of sea urchins include tandem repeats of a sequence (CTCC) that encodes the recognition site for the restriction enzyme Mnl1, as do the analogous polypurine/polypyrimidine (pPu/pPy) stretches in humans. The Mnl1-sensitive repeats, which exist as a microsatellite sequence 200-300 bp in length, lack the terminal dyad symmetry characteristic of the TU elements and are structurally and functionally distinct from these elements. DNA fragments containing these repeat units that are isolated from different generations of isogenic (or congenic) mice or from different tissues of genetically identical individuals are indistinguishable by RFLP analysis; however, they show restriction fragment length polymorphism in different strains. This polymorphism appears to reflect DNA sequence changes occurring at sites flanking the repeats rather than variability in the number of repeats. Their genetic stability and occurrence in a wide variety of animal species make the Mnl1 repeats useful in studying genetic variation that has occurred over an evolutionary time scale of greater duration than can be examined conveniently by VNTR analysis.  相似文献   

19.
20.
Short protein repeats, frequently with a length between 20 and 40 residues, represent a significant fraction of known proteins. Many repeats appear to possess high amino acid substitution rates and thus recognition of repeat homologues is highly problematic. Even if the presence of a certain repeat family is known, the exact locations and the number of repetitive units often cannot be determined using current methods. We have devised an iterative algorithm based on optimal and sub-optimal score distributions from profile analysis that estimates the significance of all repeats that are detected in a single sequence. This procedure allows the identification of homologues at alignment scores lower than the highest optimal alignment score for non-homologous sequences. The method has been used to investigate the occurrence of eleven families of repeats in Saccharomyces cerevisiae, Caenorhabditis elegans and Homo sapiens accounting for 1055, 2205 and 2320 repeats, respectively. For these examples, the method is both more sensitive and more selective than conventional homology search procedures. The method allowed the detection in the SwissProt database of more than 2000 previously unrecognised repeats belonging to the 11 families. In addition, the method was used to merge several repeat families that previously were supposed to be distinct, indicating common phylogenetic origins for these families.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号