首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
In this work we selected double-stranded DNA sequences capable of forming stable triplexes at 20 or 50 degrees C with corresponding 13mer purine oligonucleotides. This selection was obtained by a double aptamer approach where both the starting sequences of the oligonucleotides and the target DNA duplex were random. The results of selection were confirmed by a cold exchange method and the influence of the position of a 'mismatch' on the stability of the triplex was documented in several cases. The selected sequences obey two rules: (i) they have a high G content; (ii) for a given G content the stability of the resulting triplex is higher if the G residues lie in stretches. The computer simulation of the Mg2+, Na+and Cl-environment around three triplexes by a density scaled Monte Carlo method provides an interpretation of the experimental observations. The Mg2+cations are statistically close to the G N7 and relatively far from the A N7. The presence of an A repels the Mg2+from adjacent G residues. Therefore, the triplexes are stabilized when the Mg2+can form a continuous spine on G N7.  相似文献   

2.
A novel method to calculate the G+C content of genomic DNA sequences.   总被引:2,自引:0,他引:2  
The base composition of a DNA fragment or genome is usually measured by the proportion of A+T or G+C in the sequence. The G+C content along genomic sequences is usually calculated using an overlapping or non-overlapping sliding window method. The result and accuracy of such an approach depends on the size of the window and the moving distance adopted. In this paper, a novel windowless technique to calculate the G+C content of genomic sequences is proposed. By this method, the G+C content can be calculated at different "resolution". In an extreme case, the G+C content may be computed at a specific point, rather than in a window of finite size. This is particularly useful to analyze the fine variation of base composition along genomic sequences. As the first example, the variation of G+C content along each of 16 yeast chromosomes is analyzed. The G+C-rich regions with length larger than 5 kb sequences are detected and listed in details. It is found that each chromosome consists of several G+C-rich and G+C-poor regions alternatively, i.e., a mosaic structure. Another example is to analyze the G+C content for each of the two chromosomes of the Vibrio cholerae genome. Based on the variations of the G+C content in each chromosome, it is shown that some fragments in the Vibrio cholerae genome may have been transferred from other species. Especially, the position and size of the large integron island on the smaller chromosome was precisely predicted. This method would be a useful tool for analyzing genomic sequences.  相似文献   

3.
K Han  H J Kim 《Nucleic acids research》1993,21(5):1251-1257
We have developed an algorithm and a computer program for simultaneously folding homologous RNA sequences. Given an alignment of M homologous sequences of length N, the program performs phylogenetic comparative analysis and predicts a common secondary structure conserved in the sequences. When the structure is not uniquely determined, it infers multiple structures which appear most plausible. This method is superior to energy minimization methods in the sense that it is not sensitive to point mutation of a sequence. It is also superior to usual phylogenetic comparative methods in that it does not require manual scrutiny for covariation or secondary structures. The most plausible 1-5 structures are produced in O(MN2 + N3) time and O(N2) space, which are the same requirements as those of widely used dynamic programs based on energy minimization for folding a single sequence. This is the first algorithm probably practical both in terms of time and space for finding secondary structures of homologous RNA sequences. The algorithm has been implemented in C on a Sun SparcStation, and has been verified by testing on tRNAs, 5S rRNAs, 16S rRNAs, TAR RNAs of human immunodeficiency virus type 1 (HIV-1), and RRE RNAs of HIV-1. We have also applied the program to cis-acting packaging sequences of HIV-1, for which no generally accepted structures yet exist, and propose potentially stable structures. Simulation of the program with random sequences with the same base composition and the same degree of similarity as the above sequences shows that structures common to homologous sequences are very unlikely to occur by chance in random sequences.  相似文献   

4.
The number of distinct functional classes of single-stranded RNAs (ssRNAs) and the number of sequences representing them are substantial and continue to increase. Organizing this data in an evolutionary context is essential, yet traditional comparative sequence analyses require that homologous sites can be identified. This prevents comparative analysis between sequences of different functional classes that share no site-to-site sequence similarity. Analysis within a single evolutionary lineage also limits evolutionary inference because shared ancestry confounds properties of molecular structure and function that are historically contingent with those that are imposed for biophysical reasons. Here, we apply a method of comparative analysis to ssRNAs that is not restricted to homologous sequences, and therefore enables comparison between distantly related or unrelated sequences, minimizing the effects of shared ancestry. This method is based on statistical similarities in nucleotide base composition among different functional classes of ssRNAs. In order to denote base composition unambiguously, we have calculated the fraction G+A and G+U content, in addition to the more commonly used fraction G+C content. These three parameters define RNA composition space, which we have visualized using interactive graphics software. We have examined the distribution of nucleotide composition from 15 distinct functional classes of ssRNAs from organisms spanning the universal phylogenetic tree and artificial ribozymes evolved in vitro. Surprisingly, these distributions are biased consistently in G+A and G+U content, both within and between functional classes, regardless of the more variable G+C content. Additionally, an analysis of the base composition of secondary structural elements indicates that paired and unpaired nucleotides, known to have different evolutionary rates, also have significantly different compositional biases. These universal compositional biases observed among ssRNAs sharing little or no sequence similarity suggest, contrary to current understanding, that base composition biases constitute a convergent adaptation among a wide variety of molecular functions.  相似文献   

5.
Carlini DB  Chen Y  Stephan W 《Genetics》2001,159(2):623-633
To gain insights into the relationship between codon bias, mRNA secondary structure, third-codon position nucleotide distribution, and gene expression, we predicted secondary structures in two related drosophilid genes, Adh and Adhr, which differ in degree of codon bias and level of gene expression. Individual structural elements (helices) were inferred using the comparative method. For each gene, four types of randomization simulations were performed to maintain/remove codon bias and/or to maintain or alter third-codon position nucleotide composition (N3). In the weakly expressed, weakly biased gene Adhr, the potential for secondary structure formation was found to be much stronger than in the highly expressed, highly biased gene Adh. This is consistent with the observation of approximately equal G and C percentages in Adhr ( approximately 31% across species), whereas in Adh the N3 distribution is shifted toward C (42% across species). Perturbing the N3 distribution to approximately equal amounts of A, G, C, and T increases the potential for secondary structure formation in Adh, but decreases it in Adhr. On the other hand, simulations that reduce codon bias without changing N3 content indicate that codon bias per se has only a weak effect on the formation of secondary structures. These results suggest that, for these two drosophilid genes, secondary structure is a relatively independent, negative regulator of gene expression. Whereas the degree of codon bias is positively correlated with level of gene expression, strong individual secondary structural elements may be selected for to retard mRNA translation and to decrease gene expression.  相似文献   

6.
This paper develops mathematical methods for describing and analyzing RNA secondary structures. It was motivated by the need to develop rigorous yet efficient methods to treat transitions from one secondary structure to another, which we propose here may occur as motions of loops within RNAs having appropriate sequences. In this approach a molecular sequence is described as a vector of the appropriate length. The concept of symmetries between nucleic acid sequences is developed, and the 48 possible different types of symmetries are described. Each secondary structure possible for a particular nucleotide sequence determines a symmetric, signed permutation matrix. The collection of all possible secondary structures is comprised of all matrices of this type whose left multiplication with the sequence vector leaves that vector unchanged. A transition between two secondary structures is given by the product of the two corresponding structure matrices. This formalism provides an efficient method for describing nucleic acid sequences that allows questions relating to secondary structures and transitions to be addressed using the powerful methods of abstract algebra. In particular, it facilitates the determination of possible secondary structures, including those containing pseudoknots. Although this paper concentrates on RNA structure, this formalism also can be applied to DNA.  相似文献   

7.
Efficient algorithms for folding and comparing nucleic acid sequences.   总被引:19,自引:12,他引:7       下载免费PDF全文
Fast algorithms for analysing sequence data are presented. An algorithm for strict homologies finds all common subsequences of length greater than or equal to 6 in two given sequences. With it, nucleic acid pieces five thousand nucleotides long can be compared in five seconds on CDC 6600. Secondary structure algorithms generate the N most stable secondary structures of an RNA molecule, taking into account all loop contributions, and the formation of all possible base-pairs in stems, including odd pairs (G.G., C.U., etc.). They allow a typical 100-nucleotide sequence to be analysed in 10 seconds. The homology and secondary structure programs are respectively illustrated with a comparison of two phage genomes, and a discussion of Drosophila melanogaster 55 RNA folding.  相似文献   

8.
An isochore map of the human genome based on the Z curve method   总被引:4,自引:0,他引:4  
Zhang CT  Zhang R 《Gene》2003,317(1-2):127-135
The distribution of the G+C content in the human genome has been studied by using a windowless technique derived from the Z curve method. The most important findings presented in this paper are twofold. First, abrupt variations of the G+C content along human chromosome sequences are the main variation patterns of G+C content. It is found that at some sites, the G+C content undergoes abrupt changes from a G+C-rich region to a G+C-poor region alternatively and vice versa. Second, it is shown that long domains with relatively homogeneous G+C content along each chromosome do exist. These domains are thought to be isochores, which usually have sharp boundaries. Consequently, 56 isochores longer than 3 Mb have been identified in chromosomes 1-22, X and Y. Boundaries, size and G+C content of each isochore identified are listed in detail. As an example to demonstrate the power of the method, the boundary between the Classes III and II isochores of the MHC sequence has been determined and found to be at 2,477,936, which is in good agreement with the experimental evidence. A homogeneity index is introduced to measure the homogeneity of G+C content in isochores. We emphasize that the homogeneity of G+C content is relative. The isochores in which the G+C content keeps absolutely constant do not exist. Isochore structures appear to be a basic organization of the human genome. Due to the relevance to many important biological functions, the clarification of isochore structures will provide much insight into the understanding of the human genome.  相似文献   

9.
The global, rather than local, variation in G+C content along the nuclear DNA sequences of various organisms was studied using GenBank sequence data. When long DNA sequences of the genomes of Escherichia coli and Saccharomyces cerevisiae were examined, the levels of their G+C content (G+C%) were found to be within a narrow range around that of the whole genome. The G+C% levels for sequences of vertebrate genomes, however, were found to cover a wide range, showing that their genome is a mosaic of sequences with different G+C% levels, in each of which the sequence is fairly homogeneous in its G+C% for a very long distance. Through surveying a human genetic map and GenBank DNA sequences, the global variations in G+C% along the human genome DNA were found to be correlated with chromosome band structures.  相似文献   

10.
The recent determination of the complete sequence of chromosome III from the yeast Saccharomyces cerevisiae allows, for the first time, the investigation of the long range primary structure of a eukaryotic chromosome. We have found that, against a background G+C level of about 35%, there are two regions (one in each chromosome arm) in which G+C values rise to over 50%. This effect is seen in silent sites within genes, but not in noncoding intergenic sequences. The variation in G+C content is not related to differential selection of synonymous codons, and probably reflects mutational biases. That the intergenic regions do not exhibit the same phenomenon is particularly interesting, and suggests that they are under substantial constraint. The yeast chromosome may be a model of the structure of the human genome, since there is evidence that it is also a mosaic of long regions of different base compositions, reflected in wide variation of G+C content at silent sites among genes. Two possible causes of this regional effect, replication timing, and recombination frequency, are discussed.  相似文献   

11.
A survey of 196 protein-coding chloroplast DNA sequences demonstrated the preference for AUG and UAA codons for initiation and termination of translation, respectively. As in prokaryotes at every nucleotide position from -25 to +25 (AUG is +1 to +3) and for 25 nucleotides 5' and 3' to the termination codon an A or U is predominant, except for C at +5 and G at +22. A Shine-Dalgarno (SD) sequence (GGAGG or tri- or tetranucleotide variant) was found within 100 bp 5' to the AUG codon in 92% of the genes. In 40% of these cases, the location of the SD sequence was similar to that of the consensus for prokaryotes (-12 to -7 5' to AUG), presumed to be optimal for translation initiation. A SD sequence could not be located in 6% of the chloroplast sequences. We propose that mRNA secondary structures may be required for the relocation of a distal SD sequences to within the optimal region (-12 to -7) for initiation of translation. We further suggest that termination at UGA codons in chloroplast genes may occur by a mechanism, involving 16S rRNA secondary structure, which has been proposed for UGA termination in E. coli.  相似文献   

12.
13.
Critical evidence for the biological relevance of G-quadruplexes (G4) has recently been obtained in seminal studies performed in a variety of organisms. Four-stranded G-quadruplex DNA structures are promising drug targets as these non-canonical structures appear to be involved in a number of key biological processes. Given the growing interest for G4, accurate tools to predict G-quadruplex propensity of a given DNA or RNA sequence are needed. Several algorithms such as Quadparser predict quadruplex forming propensity. However, a number of studies have established that sequences that are not detected by these tools do form G4 structures (false negatives) and that other sequences predicted to form G4 structures do not (false positives). Here we report development and testing of a radically different algorithm, G4Hunter that takes into account G-richness and G-skewness of a given sequence and gives a quadruplex propensity score as output. To validate this model, we tested it on a large dataset of 392 published sequences and experimentally evaluated quadruplex forming potential of 209 sequences using a combination of biophysical methods to assess quadruplex formation in vitro. We experimentally validated the G4Hunter algorithm on a short complete genome, that of the human mitochondria (16.6 kb), because of its relatively high GC content and GC skewness as well as the biological relevance of these quadruplexes near instability hotspots. We then applied the algorithm to genomes of a number of species, including humans, allowing us to conclude that the number of sequences capable of forming stable quadruplexes (at least in vitro) in the human genome is significantly higher, by a factor of 2–10, than previously thought.  相似文献   

14.
E P Rocha  A Danchin    A Viari 《Nucleic acids research》1999,27(17):3567-3576
We analysed the Bacillus subtilis protein coding sequences termini, and compared it to other genomes. The analysis focused on signals, com-positional biases of nucleotides, oligonucleotides, codons and amino acids and mRNA secondary structure. AUG is the preferred start codon in all genomes, independent of their G+C content, and seems to induce less stable mRNA structures. However, it is not conserved between homologous genes neither is it preferred in highly expressed genes. In B.subtilis the ribosome binding site is very strong. We found that downstream boxes do not seem to exist either in Escherichia coli or in B.subtilis. UAA stop codon usage is correlated with the G+C content and is strongly selected in highly expressed genes. We found less stable mRNA structures at both termini, which we related to mRNA-ribosome and mRNA-release-factor interactions. This pattern seems to impose a peculiar A-rich nucleotide and codon usage bias in these regions. Finally the analysis of all proteins from B.subtilis revealed a similar amino acid bias near both termini of proteins consisting of over-representation of hydrophilic residues. This bias near the stop codon is partially release-factor specific.  相似文献   

15.
Abstract

This paper develops mathematical methods for describing and analyzing RNA secondary structures. It was motivated by the need to develop rigorous yet efficient methods to treat transitions from one secondary structure to another, which we propose here may occur as motions of loops within RNAs having appropriate sequences. In this approach a molecular sequence is described as a vector of the appropriate length. The concept of symmetries between nucleic acid sequences is developed, and the 48 possible different types of symmetries are described. Each secondary structure possible for a particular nucleotide sequence determines a symmetric, signed permutation matrix. The collection of all possible secondary structures is comprised of all matrices of this type whose left multiplication with the sequence vector leaves that vector unchanged. A transition between two secondary structures is given by the product of the two corresponding structure matrices. This formalism provides an efficient method for describing nucleic acid sequences that allows questions relating to secondary structures and transitions to be addressed using the powerful methods of abstract algebra. In particular, it facilitates the determination of possible secondary structures, including those containing pseudoknots. Although this paper concentrates on RNA structure, this formalism also can be applied to DNA  相似文献   

16.
MOTIVATION: RNA structure motifs contained in mRNAs have been found to play important roles in regulating gene expression. However, identification of novel RNA regulatory motifs using computational methods has not been widely explored. Effective tools for predicting novel RNA regulatory motifs based on genomic sequences are needed. RESULTS: We present a new method for predicting common RNA secondary structure motifs in a set of functionally or evolutionarily related RNA sequences. This method is based on comparison of stems (palindromic helices) between sequences and is implemented by applying graph-theoretical approaches. It first finds all possible stable stems in each sequence and compares stems pairwise between sequences by some defined features to find stems conserved across any two sequences. Then by applying a maximum clique finding algorithm, it finds all significant stems conserved across at least k sequences. Finally, it assembles in topological order all possible compatible conserved stems shared by at least k sequences and reports a number of the best assembled stem sets as the best candidate common structure motifs. This method does not require prior structural alignment of the sequences and is able to detect pseudoknot structures. We have tested this approach on some RNA sequences with known secondary structures, in which it is capable of detecting the real structures completely or partially correctly and outperforms other existing programs for similar purposes. AVAILABILITY: The algorithm has been implemented in C++ in a program called comRNA, which is available at http://ural.wustl.edu/softwares.html  相似文献   

17.
The use of complementary RNA or DNA sequences to selectively interfere with the utilization of mRNA of a target gene is an attractive therapeutic strategy. Two well-studied targets for oligonucleotide therapy are the c-mycand c-mybproto-oncogenes. It has been reported that sequences which contain four contiguous Gs can elicit a non-antisense response, due to the formation of a homotetrameric G quartet structure. Therefore, it was of interest to determine whether anti-c-mycand anti-c-mybphosphorothioate DNAs including tetraguanylate form higher order structures under physiologically relevant salt conditions and temperature. First, the identity of the higher order structure was established and was found to be a tetraplex. Employing intracellular (high K+), extracellular (low K+) and normal saline (no K+) salt mixtures, native gel electrophoresis revealed no tetraplex formation at 37 degrees C, the physiologically relevant temperature. On the other hand, tetraplex structure formation was observed at 4 and 23 degrees C. Hence, the potential for these sequences to form tetraplex structures at lower temperatures may not be relevant for their activity in cells and animals at physiological temperature.  相似文献   

18.
S Miyazawa  R L Jernigan 《Proteins》1999,36(3):347-356
Short-range interactions for secondary structures of proteins are evaluated as potentials of mean force from the observed frequencies of secondary structures in known protein structures which are assumed to have an equilibrium distribution with the Boltzmann factor of secondary structure energies. A secondary conformation at each residue position in a protein is described by a tripeptide, including one nearest neighbor on each side. The secondary structure potentials are approximated as additive contributions from neighboring residues along the sequence. These are part of an empirical potential to provide a crude estimate of protein conformational energy at a residue level. Unlike previous works, interactions are decoupled into intrinsic potentials of residues, potentials of backbone-backbone interactions, and of side chain-backbone interactions. Also interactions are decoupled into one-body, two-body, and higher order interactions between peptide backbone and side chain and between backbones. These decouplings are essential to correctly evaluate the total secondary structure energy of a protein structure without overcounting interactions. Each interaction potential is evaluated separately by taking account of the correlation in the amino acid order of protein sequences. Interactions among side chains are neglected, because of the relatively limited number of protein structures. Proteins 1999;36:347-356. Published 1999 Wiley-Liss, Inc.  相似文献   

19.
The sequences and structures of RNase P RNAs of some Gram-positive bacteria, e.g. Bacillus subtilis, are very different than those of other bacteria. In order to expand our understanding of the structure and evolution of RNase P RNA in Gram-positive bacteria, gene sequences encoding RNase P RNAs from 10 additional species from this evolutionary group have been determined, doubling the number of sequences available for comparative analysis. The enlarged data set allows refinement of the secondary structure model of these unusual RNase P RNAs and the identification of potential tertiary interactions between P10.1 and L12, and between L5.1 and L15.1. The newly-obtained sequences suggest that RNase P RNA underwent an abrupt, dramatic restructuring in the ancestry of the low-G+C Gram-positive bacteria after the divergence of the branches leading to the 'Clostridia and relatives' and the remaining low-G+C Gram-positive species. The unusual structures of the RNase P RNAs of Mycoplasma hyopneumoniae and M.floccularre are apparently derived from RNAs with Bacillus-like structure rather than from intermediate, partially restructured ancestral RNAs. The structure of the RNase P RNA from the photosynthetic Heliobacillus mobilis supports the relationship of this specie with Bacillus and Staphylococcus rather than the 'Clostridia and relatives' as suggested by the sequences of their small-subunit ribosomal RNAs.  相似文献   

20.
An increasing number of recognition mechanisms in RNA are found to involve G.U base pairs. In order to detect new functional sites of this type, we exhaustively analyzed the sequence alignments and secondary structures of eubacterial and chloroplast 16S and 23S rRNA, seeking positions with high levels of G.U pairs. Approximately 120 such sites were identified and classified according to their secondary structure and sequence environment. Overall biases in the distribution of G.U pairs are consistent with previously proposed structural rules: the side of the wobble pair that is subject to a loss of stacking is preferentially exposed to a secondary structure loop, where stacking is not as essential as in helical regions. However, multiple sites violate these rules and display highly conserved G.U pairs in orientations that could cause severe stacking problems. In addition, three motifs displaying a conserved G.U pair in a specific sequence/structure environment occur at an unusually high frequency. These motifs, of which two had not been reported before, involve sequences 5''UG3'' 3''GA5'' and 5''UG3'' 3''GU5'', as well as G.U pairs flanked by a bulge loop 3'' of U. The possible structures and functions of these recurrent motifs are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号