首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Alternating purine-pyrimidine sequences (RY repeats) demonstrate considerable homology to the consensus sequence for vertebrate topoisomerase II (Spitzner and Muller (1988) Nucleic Acids Res. 16: 1533-1556). This is shown below and positions that can match are underscored. RYRYRYRYRYRYRYRYRY = alternating purine-pyrimidine 18 bp RNYNNCNNGYNGKTNYNY = topoisomerase II consensus sequence (R is purine, Y is pyrimidine, K is G or T.) Topoisomerase II cleavage reactions were performed (in the absence of inhibitors) on a plasmid containing a 54 base RY repeat and the single strong cleavage site mapped to the RY repeat. Analysis of this DNA on sequencing gels showed that the enzyme cleaved a number of sites, all within the 54 base pair RY repeat. Topoisomerase II also made clustered cleavages within other RY repeats that were examined. Quantitative analysis of homology to the consensus sequence, as measured by the match of a site to a matrix of base proportions from the consensus data base (the matrix mean), showed that both the locations and the frequencies of cleavage sites within RY repeats were proportional to homology scores. However, topoisomerase II cleaved RY repeats preferentially in comparison to non-RY sites with similar homology scores. The activity of the enzyme at RY repeats appears to be proportional to the length of the repeat; additionally, GT, AC and AT repeats were better substrates for cleavage than GC repeats.  相似文献   

2.
The non-coding fraction of the human genome, which is approximately 98%, is mainly constituted by repeats. Transpositions, expansions and deletions of these repeat elements contribute to a number of diseases. None of the available databases consolidates information on both tandem and interspersed repeats with the flexibility of FASTA based homology search with reference to disease genes. Repeats in diseases database (RiDs db) is a web accessible relational database, which aids analysis of repeats associated with Mendelian disorders. It is a repository of disease genes, which can be searched by FASTA program or by limitedor free- text keywords. Unlike other databases, RiDs db contains the sequences of these genes with access to corresponding information on both interspersed and tandem repeats contained within them, on a unified platform. Comparative analysis of novel or patient sequences with the reference sequences in RiDs db using FASTA search will indicate change in structure of repeats, if any, with a particular disorder. This database also provides links to orthologs in model organisms such as zebrafish, mouse and Drosophila. AVAILABILITY: The database is available for free at http://115.111.90.196/ridsdb/index.php.  相似文献   

3.
Two-dimensional graphic analysis of DNA sequence homologies.   总被引:9,自引:3,他引:6       下载免费PDF全文
We describe a computer program designed to facilitate the pattern matching analysis of homologies between DNA sequences. It takes advantage of a two-dimensional plot in order to simplify the evaluation of significant structures inherited in the sequences. The program can be divided into three parts, i) algorithm for search of homologies, ii) two-dimensional graphic display of the result, iii) further graphic treatment to enhance significant structures. The power of the graphic display is presented by the following application of the program. We conducted a search for direct repeats in the mouse immunoglobulin kappa-chain genes. Both the five J DNA sequences and other shorter repeats were found. We also found a longer stretch of homology that could indicate the presence of duplicated DNA in the J4, J5 region.  相似文献   

4.
We describe a new class of DNA length polymorphism that is due to a variation in the number of tandem repeats associated with Alu sequences (Alu sequence-related polymorphisms). The polymerase chain reaction was used to selectively amplify a (TTA)n repeat identified in the 3-hydroxy-3-methylglutaryl coenzyme A (HMG CoA) reductase gene from genomic DNA of 41 human subjects, and the size of the amplified products was determined by gel electrophoresis. Seven alleles were found that differed in size by integrals of three nucleotides. The allele frequencies ranged from 1.5% to 52%, and the overall heterozygosity index was 62%. The polymorphic TTA repeat was located adjacent to a repetitive sequence of the Alu family. A homology search of human genomic DNA sequences for the trinucleotide TTA (at least five members in length) revealed tandem repeats in six other genes. Three of the six (TTA)n repeats were located adjacent to Alu sequences, and two of the three (in the genes for beta-tubulin and interleukin-1 alpha) were found to be polymorphic in length. Tandemly repetitive sequences found in association with Alu sequences may be frequent sites of length polymorphism that can be used as genetic markers for gene mapping or linkage analysis.  相似文献   

5.
MOTIVATION: Complex genomes contain numerous repeated sequences, and genomic duplication is believed to be a main evolutionary mechanism to obtain new functions. Several tools are available for de novo repeat sequence identification, and many approaches exist for clustering homologous protein sequences. We present an efficient new approach to identify and cluster homologous DNA sequences with high accuracy at the level of whole genomes, excluding low-complexity repeats, tandem repeats and annotated interspersed repeats. We also determine the boundaries of each group member so that it closely represents a biological unit, e.g. a complete gene, or a partial gene coding a protein domain. RESULTS: We developed a program called HomologMiner to identify homologous groups applicable to genome sequences that have been properly marked for low-complexity repeats and annotated interspersed repeats. We applied it to the whole genomes of human (hg17), macaque (rheMac2) and mouse (mm8). Groups obtained include gene families (e.g. olfactory receptor gene family, zinc finger families), unannotated interspersed repeats and additional homologous groups that resulted from recent segmental duplications. Our program incorporates several new methods: a new abstract definition of consistent duplicate units, a new criterion to remove moderately frequent tandem repeats, and new algorithmic techniques. We also provide preliminary analysis of the output on the three genomes mentioned above, and show several applications including identifying boundaries of tandem gene clusters and novel interspersed repeat families. AVAILABILITY: All programs and datasets are downloadable from www.bx.psu.edu/miller_lab.  相似文献   

6.
We designed a simple but sensitive program, IntraCompare, for identifying internal repeats in families of homologous proteins. The protein sequences are aligned (Clustal X), the regions to be compared are selected, and all potential repeat sequences are compared with all others. The output provides comparison scores (GAP program) expressed in standard deviations.  相似文献   

7.
A new rice repetitive DNA shows sequence homology to both 5S RNA and tRNA.   总被引:2,自引:0,他引:2  
T Y Wu  R Wu 《Nucleic acids research》1987,15(15):5913-5923
Moderately repetitive DNA sequences are found in the genomes of all eucaryotes that have been examined. We now report the discovery of a novel, transcribed, moderately repetitive DNA sequence in a higher plant which is different from any of the known repetitive DNA sequences from any organism. We isolated a rice cDNA clone which hybridizes to multiple bands on genomic blot analysis. The sequence of this 352 bp cDNA contains four regions of homology to the wheat phenylalanine tRNA, including the polymerase III-type promoter. Unexpectedly, two regions of the same 352 bp sequence also show homology to the wheat 5S RNA sequence. Using the cDNA as a probe, we have isolated six genomic clones which contain long tandem repeats of 355 bp sequence, and have sequenced nine repeat units. Our findings suggest that the rice repetitive sequence may be an amplified pseudogene with sequence homology to both 5S RNA and tRNA, but organized as long tandem repeats resembling 5S RNA genes. This is the first example showing homology between the sequences of a moderately repetitive DNA with unknown function and 5S RNA.  相似文献   

8.
基于后缀列的基因序列最大串联重复查找技术   总被引:1,自引:0,他引:1  
重复序列分析在全基因组研究中起着重要作用,其首要任务就是在DNA序列中识别并定位所有的重复结构。本文提出了一种新的算法,此算法基于一种简单的数据结构——后缀数,用于查找给定的DNA序列中所有的最大串联重复。并且在该算法的基础上编写了一个有效实用的软件——RepLocate,同时给出了它应用到已知的DNA序列的实例。  相似文献   

9.
MOTIVATION: Low-complexity or cryptically simple sequences are widespread in protein sequences but their evolution and function are poorly understood. To date methods for the detection of low complexity in proteins have been directed towards the filtering of such regions prior to sequence homology searches but not to the analysis of the regions per se. However, many of these regions are encoded by non-repetitive DNA sequences and may therefore result from selection acting on protein structure and/or function. RESULTS: We have developed a new tool, based on the SIMPLE algorithm, that facilitates the quantification of the amount of simple sequence in proteins and determines the type of short motifs that show clustering above a certain threshold. By modifying the sensitivity of the program simple sequence content can be studied at various levels, from highly organised tandem structures to complex combinations of repeats. We compare the relative amount of simplicity in different functional groups of yeast proteins and determine the level of clustering of the different amino acids in these proteins. AVAILABILITY: The program is available on request or online at http://www.biochem.ucl.ac.uk/bsm/SIMPLE.  相似文献   

10.
The contextual analysis of nucleotide sequences of 22 Alu repeats arrangement regions in the human genome has been carried out and some of their peculiarities have been revealed. In particular, the occurrence of marked and statistical non-random homology between the repeats and the regions of their integration has been shown. A mechanism of choosing the Alu repeats insertion regions in the genome has been suggested taking into account these peculiarities. Using a sample of the 80 human Alu repeats sequences peculiarities of these repeats location within the genome has been investigated. A tendency to the formation of Alu repeats clusters in various regions of the genome was revealed. A range of possible mechanisms on such Alu clusters emergence is considered. On the basis of the data obtained an "attraction" mechanism, according to which integration of Alu repeats into the definite region of the genome increases the insertion probability of other Alu repeats into the same region, are proposed.  相似文献   

11.
Biological sequences are often analyzed by detecting homologous regions between them. Homology search is confounded by simple repeats, which give rise to strong similarities that are not homologies. Standard repeat-masking methods fail to eliminate this problem, and they are especially ill-suited to AT-rich DNA such as malaria and slime-mould genomes. We present a new repeat-masking method, tantan, which is motivated by the mechanisms that create simple repeats. This method thoroughly eliminates spurious homology predictions for DNA–DNA, protein–protein and DNA–protein comparisons. Moreover, it enables accurate homology search for non-coding DNA with extreme A + T composition.  相似文献   

12.
Short protein repeats, frequently with a length between 20 and 40 residues, represent a significant fraction of known proteins. Many repeats appear to possess high amino acid substitution rates and thus recognition of repeat homologues is highly problematic. Even if the presence of a certain repeat family is known, the exact locations and the number of repetitive units often cannot be determined using current methods. We have devised an iterative algorithm based on optimal and sub-optimal score distributions from profile analysis that estimates the significance of all repeats that are detected in a single sequence. This procedure allows the identification of homologues at alignment scores lower than the highest optimal alignment score for non-homologous sequences. The method has been used to investigate the occurrence of eleven families of repeats in Saccharomyces cerevisiae, Caenorhabditis elegans and Homo sapiens accounting for 1055, 2205 and 2320 repeats, respectively. For these examples, the method is both more sensitive and more selective than conventional homology search procedures. The method allowed the detection in the SwissProt database of more than 2000 previously unrecognised repeats belonging to the 11 families. In addition, the method was used to merge several repeat families that previously were supposed to be distinct, indicating common phylogenetic origins for these families.  相似文献   

13.
Illegitimate (nonhomologous) recombination requires little or no sequence homology between recombining DNAs and has been regarded as being a process distinct from homologous recombination, which requires a long stretch of homology between recombining DNAs. Under special conditions in Escherichia coli, we have found a new type of illegitimate recombination that requires an interaction between homologous DNA sequences. It was detected when a plasmid that carried 2-kb-long inverted repeats was subjected to type II restriction in vitro and type I (EcoKI) restriction in vivo within a delta rac recBC recG ruvC strain. Removal of one of the repeats or its replacement with heterologous DNA resulted in a reduction in the level of recombination. The recombining sites themselves shared, at most, a few base pairs of homology. Many of the recombination events joined a site in one of the repeats with a site in another repeat. In two of the products, one of the recombining sites was at the end of one of the repeats. Removal of one of the EcoKI sites resulted in decreased recombination. We discuss the possibility that some structure made by homologous interaction between the long repeats is used by the EcoKI restriction enzyme to promote illegitimate recombination. The possible roles and consequences of this type of homologous interaction are discussed.  相似文献   

14.
Molecular Characterization of a Maize B Chromosome Centric Sequence   总被引:28,自引:0,他引:28       下载免费PDF全文
Supernumerary chromosomes are widespread in the plant kingdom but little is known of their molecular nature or mechanism of origin. We report here the initial cloning of sequences from the maize B chromosome. Our analysis suggests that many sequences are highly repetitive and shared with the normal A chromosomes. However, all clones selected for B-specificity contain at least one copy of a particular repeat. Cytological mapping using B chromosome derivatives and in situ hybridization show that the B specific repeats are derived from the centric region of the chromosome. Sequence analysis of this repeat shows homology to motifs mapped to various plant and animal centromeres and to the maize neocentromere. A precise localization of these sequences among breakpoints within the B centromere and an homology to a facultative centromere, suggest a role for this sequence in centromere function.  相似文献   

15.
We report the characterization of 3 new repetitive sequences from the bivalve mollusc Mytilus galloprovincialis, designated Mg1, Mg2, and Mg3, with monomer lengths of 169, 260, and 70 bp, respectively. The 3 repeats together constitute approximately 7.8% of the M. galloprovincialis genome and were found, together with ApaI-type 2 repeats, inside the introns of 2 genes of the HSP70 family, hsc70 and hsc71. Both the monomer length and the genomic content of the repeats indicate satellite sequences. The Mg1 repetitive region and its flanking sequences exhibit significant homology to CvE, a member of the Pearl family of mobile elements found in the eastern oyster (Crassostrea virginica). Thus, the whole homologous region is designated MgE, the first putative transposable element characterized in M. galloprovincialis. The ApaI, Mg2, and Mg3 repeats are continuously arranged inside the introns of both the hsc70 and hsc71 genes. The presence of perfect inverted repeats flanking the ApaI-Mg2-Mg3 repetitive region, as well as a sequence analysis of the repeats, indicates a transposition-like insertion of this region. The genes of the HSP70 family are highly conserved, and the presence of repetitive DNA or of mobile elements inside their introns is reported here for the first time.  相似文献   

16.
The sequences of three mitochondrial carriers involved in energy transfer, the ADP/ATP carrier, phosphate carrier and uncoupling carrier, are analyzed. Similarly to what has been previously reported for the ADP/ATP carrier and the uncoupling protein, now also the phosphate carrier is found to have a tripartite structure comprising three similar repeats of approx. 100 residues each. The three sequences show a fair overall homology with each other. More significant homologies are found by comparing the repeats within and between the carriers in a scheme where the sequences are spliced into repeats, which are arranged for maximum homology by allowing possible insertions or deletions. A striking conservation of critical residues, glycine, proline, of charged and of aromatic residues is found throughout all nine repeats. This is indicative of a similar structural principle in the repeats. Hydropathy profiles of the three proteins and a search for amphipathic alpha-spans reveal six membrane-spanning segments for each carrier, providing further support for the basic structural identity of the repeats. The proposed folding pattern of the carriers in the membrane is exemplified with the phosphate carrier. A possible tertiary arrangement of the repeats and the membrane-spanning helices is shown. The emergence of a mitochondrial carrier family by triplication and by divergent evolution from a common gene of about 100 residues is discussed.  相似文献   

17.
DNA fragments containing T-DNA/plant DNA junctions isolated from 17 transgenic tobacco plants were amplified using inverse PCR. Analysis of the nucleotide sequences of 34 cloned DNA fragments revealed 100% homology with vector sequences outside T-DNA in 10 cases. Nine nucleotide sequences had homology with the repeats in the tobacco genome. The percentage of homology varied from 70 to 90%, with the identified repeats belonging to different types. In most clones no homology was revealed with the GENEBANK sequences. Alignment of the sequences truncated during the integration of the left and the right borders of the T-DNA insertions demonstrated significant clusterization (10 bp region) of truncation sites for the left border. Five sequences had identical truncation sites (+23 T) that showed the perferable use of this nucleotide. The AT content varied from 51 to 72% which was close to the total percentage of AT pairs in the tobacco genome.  相似文献   

18.
Two independent isolates of a Bordetella pertussis repeated DNA unit were sequenced and shown to be an insertion sequence element with five nucleotide differences between the two copies. The sequences were 1053 bp in length with near-perfect terminal inverted repeats of 28 bp, had three open reading frames, and were each flanked by short direct repeats. The two insertion sequences showed considerable homology to two other B. pertussis repeated DNA sequences reported recently: IS481 and a 530 bp repeated DNA unit. The B. pertussis insertion sequence would appear to comprise a group of closely related sequences differing mainly in flanking direct repeats and the terminal inverted repeats. The two isolates reported here, which were from the adenylate cyclase and agglutinogen 2 regions of the genome, were numbered IS48lvl and IS48lv2 respectively.  相似文献   

19.
The precursor of pulmonary surfactant-associated protein, SP-B, is composed of an NH2-terminal domain of 30 residues (a-type domain) and three tandem repeats of about 90 residues (b-type domain); biophysically active mature SP-B corresponds to the second b-type repeat. Consensus sequences constructed for the b-type repeats were used to search the data base for homologous sequences, and the search has revealed that prosaposin and sulfated glycoprotein 1 show a remarkable homology with these repeats. The domain organizations of the latter proteins, however, differ from that of SP-B precursor inasmuch as they contain four tandem copies of the b-type domain and a-type domains are present both in the NH2-terminal and COOH-terminal parts of the proteins. The implications of the homology of saposins and SP-B for their structure and function are discussed.  相似文献   

20.
We describe new tests, of general application, for deciding whether two proteins or DNA sequences are significantly homologous, in cases where the relationship is neither evidently true nor evidently false. Ralston and Bishop's comparison of the c-myc oncogene with the adenovirus E1a protein is discussed as an example. When the comparison matrix test is used to establish a homology between two sequences it is necessary that the number of high scores exceeds the expected mean level for random sequences by a statistically significant margin. The mean level itself is found from the double matching probability distribution. In examples where the number of high scores is larger than expected, but the highest score is not in itself exceptional, the variance of the numbers of scores expected for unrelated sequences is an important factor. We have analysed these variances by several methods. A simple binomial distribution gives only a rather inaccurate and low first estimate, but we derive a more rigorous and accurate statistical treatment, to take account of the correlations between scores in different parts of the comparison matrix. The theory is exact for random DNA or protein sequences with fluctuating compositions, selected by random draws from an infinite pool. In the more realistic situation, where sequences of fixed composition are formed by random permutations of the original sets, the deviations are smaller, and have been analysed by computer simulation. We find that although the relationship proposed by Ralston & Bishop, between the c-myc oncogene and adenovirus E1a proteins, appears to be significant in the binomial approximation, it is not supported by the full analysis. We conclude that, in general, great care is needed to establish any weak homology on the basis of comparisons that include no truly exceptional high scores, but merely have an enhanced number of scores at the upper end of the expected distribution.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号