首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Switching effect in prey--predator system   总被引:5,自引:0,他引:5  
Statistical analysis of DNA base sequences generated from nearest neighbor frequencies by a Monte Carlo technique yields distributions of pyrimidine tracts in good agreement with experimental results. Better agreement with experiment is obtained with nearest-neighbor-frequency based calculations than with calculations based on base composition which assume random base arrangements. The nearest-neighbor-frequency method can also be applied to the analysis of high resolution thermal denaturation profiles, the sequence specific interaction of drugs and proteins, and the distribution of photoproducts produced in DNA by ultraviolet radiation.  相似文献   

2.
Effective probabilistic modeling approaches have been developed to find motifs of biological function in DNA sequences. However, the problem of automated model choice remains largely open and becomes more essential as the number of sequences to be analyzed is constantly increasing. Here we propose a reversible jump Markov chain Monte Carlo algorithm for estimating both parameters and model dimension of a Bayesian hidden semi-Markov model dedicated to bacterial promoter motif discovery. Bacterial promoters are complex motifs composed of two boxes separated by a spacer of variable but constrained length and occurring close to the protein translation start site. The algorithm allows simultaneous estimations of the width of the boxes, of the support size of the spacer length distribution, and of the order of the Markovian model used for the "background" nucleotide composition. The application of this method on three sequence sets points out the good behavior of the algorithm and the biological relevance of the estimated promoter motifs.  相似文献   

3.
Liang LJ  Weiss RE 《Biometrics》2007,63(3):733-741
Phylogenetic modeling is computationally challenging and most phylogeny models fit a single phylogeny to a single set of molecular sequences. Individual phylogenetic analyses are typically performed independently using publicly available software that fits a computationally intensive Bayesian model using Markov chain Monte Carlo (MCMC) simulation. We develop a Bayesian hierarchical semiparametric regression model to combine multiple phylogenetic analyses of HIV-1 nucleotide sequences and estimate parameters of interest within and across analyses. We use a mixture of Dirichlet processes as a prior for the parameters to relax inappropriate parametric assumptions and to ensure the prior distribution for the parameters is continuous. We use several reweighting algorithms for combining completed MCMC analyses to shrink parameter estimates while adjusting for data set-specific covariates. This avoids constructing a large complex model involving all the original data, which would be computationally challenging and would require rewriting the existing stand-alone software.  相似文献   

4.
Both thermal fluctuations and the intrinsic curvature of DNA contribute to conformations of the DNA axis. We looked for a way to estimate the relative contributions of these two components of the double-helix curvature for DNA with a typical sequence. We developed a model and Monte Carlo procedure to simulate the Boltzmann distribution of DNA conformations with a specific intrinsic curvature. Two steps were used to construct the equilibrium conformation of the model chain. We first specified the equilibrium DNA conformation at the base pair level of resolution, using a set of the equilibrium dinucleotide angles and DNA sequence. This conformation was then approximated by the conformation of the model chain consisting of a reduced number of longer, straight cylindrical segments. Each segment of the chain corresponded to a certain number of DNA base pairs. We simulated conformational properties of nicked circular DNA for different sets of equilibrium dinucleotide angles, different random DNA sequences, and lengths. Only random sequences of DNA generated with equal probability of appearance for all types of bases at any site of the sequence were used. The results showed that for a broad range of intrinsic curvature parameters, the radius of gyration of DNA circles should be nearly independent of DNA sequence for all DNA lengths studied. We found, however, a DNA properly that should strongly depend on DNA sequence if the double helix has essential intrinsic curvature. This property is the equilibrium distribution of the linking number for DNA circles that are 300-1000 bp in length. We found that a large fraction of the distributions corresponding to random DNA sequences should have two separate maxima. The physical nature of this unexpected effect is discussed. This finding opens new opportunities for joined experimental and theoretical studies of DNA intrinsic curvature.  相似文献   

5.
We have developed a method for detecting more stable and significantfolding regions relative to others in the sequence. The algorithmis based on the calculation of the lowest free energy of RNAsecondary structures and Monte Carlo simulation. For any givenRNA segment, the stability and statistical significance of RNAfolding are assessed by two measures: the stability score andthe significance score. The stability score measures the degreeof thermodynamic stability of the segment between all possiblebiological segments in the RNA sequence. The significance scorecharacterizes the specific arrangement of the nucleotides inthe segment that could imply a structural role for the sequenceinformation. Using these two measures, we are able to detecta series of distinct folding regions where highly stable andstatistically significant secondary structures occur in humanimmunodeficiency virus (HIV) and simian immunodeficiency virus(SIV) sequences. Received on April 4, 1990; accepted on October 2, 1990  相似文献   

6.
7.
We developed a software tool (SlidingBayes) for recombination analysis based on Bayesian phylogenetic inference. Sliding-Bayes provides a powerful approach for detecting potential recombination, especially between highly divergent sequences and complex HIV-1 recombinants for which simpler methods like neighbor joining (NJ) may be less powerful. SlidingBayes guides Markov Chain Monte Carlo (MCMC) sampling performed by MrBayes in a sliding window across the alignment (Bayesian scanning). The tool can be used for nucleotide and amino acid sequences and combines all the modeling possibilities of MrBayes with the ability to plot the posterior probability support for clustering of various combinations of taxa.  相似文献   

8.
MOTIVATION: To devise a method that, unlike available methods, directly measures variations in phylogenetic signals in gene sequences that result from recombination, tests the significance of the signal variations and distinguishes misleading signals. RESULTS: We have developed a method, that we call 'sister-scanning', for assessing phylogenetic and compositional signals in the various patterns of identity that occur between four nucleotide sequences. A Monte Carlo randomization is done for all columns (positions) within a window and Z-scores are obtained for four real sequences or three real sequences with an outlier that is also randomized. The usefulness of the approach is demonstrated using tobamovirus and luteovirus sequences. Contradictory phylogenetic signals were distinguished in both datasets, as were regions of sequence that contained no clear signal or potentially misleading signals related to compositional similarities. In the tobamovirus dataset, contradictory phylogenetic signals were separated by coding sequences up to a kilobase long that contained no clear signal. Our re-analysis of this dataset using sister-scanning also yielded the first evidence known to us of an inter-species recombination site within a viral RNA-dependent RNA polymerase gene together with evidence of an unusual pattern of conservation in the three codon positions.  相似文献   

9.
SUMMARY: BLAST statistics have been shown to be extremely useful for searching for significant similarity hits, for amino acid and nucleotide sequences. Although these statistics are well understood for pairwise comparisons, there has been little success developing statistical scores for multiple alignments. In particular, there is no score for multiple alignment that is well founded and treated as a standard. We extend the BLAST theory to multiple alignments. Following some simple assumptions, we present and justify a significance score for multiple segments of a local multiple alignment. We demonstrate its usefulness in distinguishing high and moderate quality multiple alignments from low quality ones, with supporting experiments on orthologous vertebrate promoter sequences.  相似文献   

10.
Biological macromolecules such as DNA, RNA, and proteins can be regarded as finite sequences of symbols (or words) over a finite alphabet. In this paper, we refer to DNA (RNA) sequences which are words on a four-letter alphabet. A comparison is made between some "genes", or fragments of them, with random sequences or random reshuffled sequences on the same alphabet and having the same length. Some combinatorial techniques of analysis of finite words are developed. A crucial role in the comparison is played by the so-called special factors of a given word. In all the analysed DNA (RNA) fragments the distribution on the length of the number of right (left) special factors differs, in a very typical way, from the corresponding distribution in a string on the same alphabet and having the same length generated by a random source or obtained by making a random alteration (=shuffling) of the original string. This kind of change is irrespective of the length in the range that we have considered <2650 bp and of the phylogenetic origin of the fragment.  相似文献   

11.
We have developed a computational method of protein design to detect amino acid sequences that are adaptable to given main-chain coordinates of a protein. In this method, the selection of amino acid types employs a Metropolis Monte Carlo method with a scoring function in conjunction with the approximation of free energies computed from 3D structures. To compute the scoring function, a side-chain prediction using another Metropolis Monte Carlo method was performed to select structurally suitable side-chain conformations from a side-chain library. In total, two layers of Monte Carlo procedures were performed, first to select amino acid types (1st layer Monte Carlo) and then to predict side-chain conformations (2nd layers Monte Carlo). We applied this method to sequence design for the entire sequence on the SH3 domain, Protein G, and BPTI. The predicted sequences were similar to those of the wild-type proteins. We compared the results of the predictions with and without the 2nd layer Monte Carlo method. The results revealed that the two-layer Monte Carlo method produced better sequence similarity to the wild-type proteins than the one-layer method. Finally, we applied this method to neuraminidase of influenza virus. The results were consistent with the sequences identified from the isolated viruses.  相似文献   

12.
Estimating the age of the common ancestor of a sample of DNA sequences   总被引:10,自引:3,他引:7  
We present a simple Monte Carlo method for estimating the age of the most recent common ancestor (MRCA) of a sample of DNA sequences. We show that Templeton's (1993) estimator of the age of the MRCA based on the maximum number of nucleotide differences between two sequences in a sample is inaccurate, and we demonstrate the new method by reanalyzing a sample of DNA sequences from human Y chromosomes and a sample of human Alu sequences.   相似文献   

13.
Polypeptide random coil conformations of various chain lenghts (N = 5, 10, 20, 40, 80 peptide units) are generated by a Monte Carlo procedure. The characteristic ratio obtained for the sets of generated conformations is identical with the exact value calculated with the average transformation matrix procedure, indicating the equivalence of the two treatments. On the basic of the generated sets of conformations the length and direction of the persistence vector (the averaged chain vector expressed in the reference frame of the first two skeletal bonds) are investigated for various chain lengths. The radial distribution function for the chain vector shows the length of the chain vector for small polypeptides (N = 5, 10) not to deviate far from its most probable value. Also for larger chains up to chains of 80 peptide units very significant deviations from a gaussian distribution are observed.The distribution of the length of the vector connecting the remote end of the chain with the end of the persistence vector exhibited behavior much doser to the gaussian approximation, an improvement especially significant for the short chains.  相似文献   

14.
In this paper we consider the competing risks model where the risks may not be independent. We assume both fixed and random censoring. The random censoring mechanism could have either a parametric or a non-parametric form. The life distributions and the parametric censoring distribution considered are exponential or Weibull. The expressions for the asymptotic confidence intervals for various parameters of interest under different models, using the estimated Fisher information matrix and parametric bootstrap techniques have been derived. Monte Carlo simulation studies for some of these cases have been carried out.  相似文献   

15.
An earlier reported method for revealing latent periodicity of the nucleotide sequences has been considerably modified in a case of small samples, by applying a Monte Carlo method. This improved method has been used to search for the latent periodicity of some nucleotide sequences of the EMBL data bank. The existence of the nucleotide sequences' latent periodicity has been shown for some genes. The results obtained have implied that periodicity of gene structure is projected onto the periodicity of primary amino acid sequences and, further, onto spatial protein conformation. Even though the periodic structure of gene sequences has been eroded, it is still retained in primary and/or spatial structures of corresponding proteins. Furthermore, in a few cases the study of genes' periodicity has suggested their possible evolutionary origin by multifold duplications of some gene's fragments.  相似文献   

16.
Random sequences     
The comparison of protein or nucleic acid sequences frequently leads to observations whose improbability can be tested only by Monte Carlo techniques that require randomizing the sequences being compared. Two decisions need to be made. One is whether one demands a resulting random sequence to have the properties of the original sequence (a shuffled sequence) or only expects it to have them (a representative sequence). The second decision concerns the properties of the sequence of which two are composition and nearest-neighbor frequencies. It is shown that biased nearest-neighbor frequencies can significantly affect the probability of observing a given result. Methods for producing random sequences according to these decisions are given.  相似文献   

17.
Several stochastic models of character change, when implemented in a maximum likelihood framework, are known to give a correspondence between the maximum parsimony method and the method of maximum likelihood. One such model has an independently estimated branch-length parameter for each site and each branch of the phylogenetic tree. This model--the no-common-mechanism model--has many parameters, and, in fact, the number of parameters increases as fast as the alignment is extended. We take a Bayesian approach to the no-common-mechanism model and place independent gamma prior probability distributions on the branch-length parameters. We are able to analytically integrate over the branch lengths, and this allowed us to implement an efficient Markov chain Monte Carlo method for exploring the space of phylogenetic trees. We were able to reliably estimate the posterior probabilities of clades for phylogenetic trees of up to 500 sequences. However, the Bayesian approach to the problem, at least as implemented here with an independent prior on the length of each branch, does not tame the behavior of the branch-length parameters. The integrated likelihood appears to be a simple rescaling of the parsimony score for a tree, and the marginal posterior probability distribution of the length of a branch is dependent upon how the maximum parsimony method reconstructs the characters at the interior nodes of the tree. The method we describe, however, is of potential importance in the analysis of morphological character data and also for improving the behavior of Markov chain Monte Carlo methods implemented for models in which sites share a common branch-length parameter.  相似文献   

18.
Abstract

We present here the results obtained by applying several different methods to quantitatively measure regularities in protein sequences based on pair-preferences. We have studied the distribution of amino acid residues, singly as well as in pairs in a large data base and have attempted this task. We confirmed the existence of well-defined pair-preferences in proteins which were shown to be remarkably absent in simulated random sequences of similar amino acid distribution. The analysis of the sequences from the SWISS-PROT data base using simple statistical tests, Fourier analysis, fractal analysis and statistical thermodynamical tests were used to derive parameters to define a natural sequence. As a consequence of the existence of pair-preferences, parameters like fractal dimension (D), spectral exponent (β), scaling parameter (H) and entropy (statistical) were found to be characteristic for natural sequences. For a reference state we chose a randomised state devoid of any pair-preference. The pair-preferences qualified well to be used as quantitative measures of regularities in protein sequences.  相似文献   

19.
A case has made for the use of Monte Carlo simulation methods when the incorporation of mutation and natural selection into Wright-Fisher gametic sampling models renders then intractable from the standpoint of classical mathematical analysis. The paper has been organized around five themes. Among these themes was that of scientific openness and a clear documentation of the mathematics underlying the software so that the results of any Monte Carlo simulation experiment may be duplicated by any interested investigator in a programming language of his choice. A second theme was the disclosure of the random number generator used in the experiments to provide critical insights as to whether the generated uniform random variables met the criterion of independence satisfactorily. A third theme was that of a review of recent literature in genetics on attempts to find signatures of evolutionary processes such as natural selection, among the millions of segments of DNA in the human genome, that may help guide the search for new drugs to treat diseases. A fourth theme involved formalization of Wright-Fisher processes in a simple form that expedited the writing of software to run Monte Carlo simulation experiments. Also included in this theme was the reporting of several illustrative Monte Carlo simulation experiments for the cases of two and three alleles at some autosomal locus, in which attempts were to made to apply the theory of Wright-Fisher models to gain some understanding as to how evolutionary signatures may have developed in the human genome and those of other diploid species. A fifth theme was centered on recommendations that more demographic factors, such as non-constant population size, be included in future attempts to develop computer models dealing with signatures of evolutionary process in genomes of various species. A brief review of literature on the incorporation of demographic factors into genetic evolutionary models was also included to expedite and stimulate further development on this theme.  相似文献   

20.
We present a general method for assessing threading score significance. The threading score of a protein sequence, thread onto a given structure, should be compared with the threading score distribution of a random amino-acid sequence, of the same length, thread on the same structure; small p-values point significantly high scores. We claim that, due to general protein contact map properties, this reference distribution is a Weibull extreme value distribution whose parameters depend on the threading method, the structure, the length of the query and the random sequence simulation model used. These parameters can be estimated off-line with simulated sequence samples, for different sequence lengths. They can further be interpolated at the exact length of a query, enabling the quick computation of the p-value.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号