共查询到20条相似文献,搜索用时 0 毫秒
1.
Identifying evolutionary trees and substitution parameters for the general Markov model with invariable sites 总被引:1,自引:0,他引:1
The general Markov plus invariable sites (GM+I) model of biological sequence evolution is a two-class model in which an unknown proportion of sites are not allowed to change, while the remainder undergo substitutions according to a Markov process on a tree. For statistical use it is important to know if the model is identifiable; can both the tree topology and the numerical parameters be determined from a joint distribution describing sequences only at the leaves of the tree? We establish that for generic parameters both the tree and all numerical parameter values can be recovered, up to clearly understood issues of 'label swapping'. The method of analysis is algebraic, using phylogenetic invariants to study the variety defined by the model. Simple rational formulas, expressed in terms of determinantal ratios, are found for recovering numerical parameters describing the invariable sites. 相似文献
2.
The selection of an optimal model for data analysis is an important component of model-based molecular phylogenetic studies. Owing to the large number of Markov models that can be used for data analysis, model selection is a combinatorial problem that cannot be solved by performing an exhaustive search of all possible models. Currently, model selection is based on a small subset of the available Markov models, namely those that assume the evolutionary process to be globally stationary, reversible, and homogeneous. This forces the optimal model to be time reversible even though the actual data may not satisfy these assumptions. This problem can be alleviated by including more complex models during the model selection. We present a novel heuristic that evaluates a small fraction of these complex models and identifies the optimal model. 相似文献
3.
A phylogenetic invariant for a model of biological sequence evolution along a phylogenetic tree is a polynomial that vanishes on the expected frequencies of base patterns at the terminal taxa. While the use of these invariants for phylogenetic inference has long been of interest, explicitly constructing such invariants has been problematic.We construct invariants for the general Markov model of kappa-base sequence evolution on an n-taxon tree, for any kappa and n. The method depends primarily on the observation that certain matrices defined in terms of expected pattern frequencies must commute, and yields many invariants of degree kappa+1, regardless of the value of n. We define strong and parameter-strong sets of invariants, and prove several theorems indicating that the set of invariants produced here has these properties on certain sets of possible pattern frequencies. Thus our invariants may be sufficient for phylogenetic applications. 相似文献
4.
Estimation of phylogeny and invariant sites under the general Markov model of nucleotide sequence evolution 总被引:2,自引:0,他引:2
The models of nucleotide substitution used by most maximum likelihood-based methods assume that the evolutionary process is stationary, reversible, and homogeneous. We present an extension of the Barry and Hartigan model, which can be used to estimate parameters by maximum likelihood (ML) when the data contain invariant sites and there are violations of the assumptions of stationarity, reversibility, and homogeneity. Unlike most ML methods for estimating invariant sites, we estimate the nucleotide composition of invariant sites separately from that of variable sites. We analyze a bacterial data set where problems due to lack of stationarity and homogeneity have been previously well noted and use the parametric bootstrap to show that the data are consistent with our general Markov model. We also show that estimates of invariant sites obtained using our method are fairly accurate when applied to data simulated under the general Markov model. 相似文献
5.
6.
Nilanjan Saha Layne T Watson Karen Kafadar Naren Ramakrishnan Alexey Onufriev Shrinivasrao Mane Cecilia Vasquez-Robinet 《Journal of computational biology》2007,14(1):97-112
Earlier work rigorously derived a general probabilistic model for the PCR process that includes as a special case the Velikanov-Kapral model where all nucleotide reaction rates are the same. In this model, the probability of binding of deoxy-nucleoside triphosphate (dNTP) molecules with template strands is derived from the microscopic chemical kinetics. A recursive solution for the probability function of binding of dNTPs is developed for a single cycle and is used to calculate expected yield for a multicycle PCR. The model is able to reproduce important features of the PCR amplification process quantitatively.With a set of favorable reaction conditions, the amplification of the target sequence is fast enough to rapidly outnumber all side products. Furthermore, the final yield of the target sequence in a multicycle PCR run always approaches an asymptotic limit that is less than one. The amplification process itself is highly sensitive to initial concentrations and the reaction rates of addition to the template strand of each type of dNTP in the solution. This paper extends the earlier Saha model with a physics based model of the dependence of the reaction rates on temperature, and estimates parameters in this new model by nonlinear regression. The calibrated model is validated using RT-PCR data. 相似文献
7.
Madabushi S Yao H Marsh M Kristensen DM Philippi A Sowa ME Lichtarge O 《Journal of molecular biology》2002,316(1):139-154
Given the massive increase in the number of new sequences and structures, a critical problem is how to integrate these raw data into meaningful biological information. One approach, the Evolutionary Trace, or ET, uses phylogenetic information to rank the residues in a protein sequence by evolutionary importance and then maps those ranked at the top onto a representative structure. If these residues form structural clusters, they can identify functional surfaces such as those involved in molecular recognition. Now that a number of examples have shown that ET can identify binding sites and focus mutational studies on their relevant functional determinants, we ask whether the method can be improved so as to be applicable on a large scale. To address this question, we introduce a new treatment of gaps resulting from insertions and deletions, which streamlines the selection of sequences used as input. We also introduce objective statistics to assess the significance of the total number of clusters and of the size of the largest one. As a result of the novel treatment of gaps, ET performance improves measurably. We find evolutionarily privileged clusters that are significant at the 5% level in 45 out of 46 (98%) proteins drawn from a variety of structural classes and biological functions. In 37 of the 38 proteins for which a protein-ligand complex is available, the dominant cluster contacts the ligand. We conclude that spatial clustering of evolutionarily important residues is a general phenomenon, consistent with the cooperative nature of residues that determine structure and function. In practice, these results suggest that ET can be applied on a large scale to identify functional sites in a significant fraction of the structures in the protein databank (PDB). This approach to combining raw sequences and structure to obtain detailed insights into the molecular basis of function should prove valuable in the context of the Structural Genomics Initiative. 相似文献
8.
Simultaneous inference for ratios of linear combinations of general linear model parameters 总被引:1,自引:0,他引:1
Consider a general linear model with p -dimensional parameter vector beta and i.i.d. normal errors. Let K(1), ..., K(k ), and L be linearly independent vectors of constants such that L(T)beta not equal 0. We describe exact simultaneous tests for hypotheses that Ki(T)beta/L(T)beta equal specified constants using one-sided and two-sided alternatives, and describe exact simultaneous confidence intervals for these ratios. In the case where the confidence set is a single bounded contiguous set, we describe what we claim are the best possible conservative simultaneous confidence intervals for these ratios - best in that they form the minimum k -dimensional hypercube enclosing the exact simultaneous confidence set. We show that in the case of k = 2, this "box" is defined by the minimum and maximum values for the two ratios in the simultaneous confidence set and that these values are obtained via one of two sources: either from the solutions to each of four systems of equations or at points along the boundary of the simultaneous confidence set where the correlation between two t variables is zero. We then verify that these intervals are narrower than those previously presented in the literature. 相似文献
9.
10.
Benjamin W. Infantolino Steph E. Forrester Matthew T.G. Pain John H. Challis 《Computer methods in biomechanics and biomedical engineering》2019,22(12):997-1008
The study examined the sensitivity of two musculoskeletal models to the parameters describing each model. Two different models were examined: a phenomenological model of human jumping with parameters based on live subject data, and the second a model of the First Dorsal Interosseous with parameters based on cadaveric measurements. Both models were sensitive to the model parameters, with the use of mean group data not producing model outputs reflective of either the performance of any group member or the mean group performance. These results highlight the value of subject specific model parameters, and the problems associated with model validation. 相似文献
11.
Alex V Kochetov Igor V Ischenko Denis G Vorobiev Alexander E Kel Vladimir N Babenko Lev L Kisselev Nikolay A Kolchanov 《FEBS letters》1998,440(3):1100
It is well known that non-coding mRNA sequences are dissimilar in many structural features. For individual mRNAs correlations were found for some of these features and their translational efficiency. However, no systematic statistical analysis was undertaken to relate protein abundance and structural characteristics of mRNA encoding the given protein. We have demonstrated that structural and contextual features of eukaryotic mRNAs encoding high- and low-abundant proteins differ in the 5′ untranslated regions (UTR). Statistically, 5′ UTRs of low-expression mRNAs are longer, their guanine plus cytosine content is higher, they have a less optimal context of the translation initiation codons of the main open reading frames and contain more frequently upstream AUG than 5′ UTRs of high-expression mRNAs. Apart from the differences in 5′ UTRs, high-expression mRNAs contain stronger termination signals. Structural features of low- and high-expression mRNAs are likely to contribute to the yield of their protein products. 相似文献
12.
This paper provides a method of using maximum likelihood to estimate the two unknown parameters, the contact rate and the removal rate, in the general stochastic epidemic, using only the observed interremoval times and the total number of cases occurring. A goodness-of-fit test is discussed, and the methods described are illustrated by means of data on an actual smallpox epidemic in a restricted community in southeastern Nigeria. 相似文献
13.
The general stochastic model of nucleotide substitution 总被引:37,自引:0,他引:37
14.
Five parameters of one of the most common neuronal models, the diffusion leaky integrate-and-fire model, also known as the
Ornstein-Uhlenbeck neuronal model, were estimated on the basis of intracellular recording. These parameters can be classified
into two categories. Three of them (the membrane time constant, the resting potential and the firing threshold) characterize
the neuron itself. The remaining two characterize the neuronal input. The intracellular data were collected during spontaneous
firing, which in this case is characterized by a Poisson process of interspike intervals. Two methods for the estimation were
applied, the regression method and the maximum-likelihood method. Both methods permit to estimate the input parameters and
the membrane time constant in a short time window (a single interspike interval). We found that, at least in our example,
the regression method gave more consistent results than the maximum-likelihood method. The estimates of the input parameters
show the asymptotical normality, which can be further used for statistical testing, under the condition that the data are
collected in different experimental situations. The model neuron, as deduced from the determined parameters, works in a subthreshold
regimen. This result was confirmed by both applied methods. The subthreshold regimen for this model is characterized by the
Poissonian firing. This is in a complete agreement with the observed interspike interval data.
Action Editor: Nicolas Brunel 相似文献
15.
16.
17.
1,3-propanediol (1,3-PD) is a chemical compound of immense importance primarily used as a raw material for fiber and textile industry. It can be produced by the fermentation of glycerol available abundantly as a by-product from the biodiesel plant. The present study was aimed at determination of key kinetic parameters of 1,3-PD fermentation by Clostridium diolis. Initial experiments on microbial growth inhibition were followed by optimization of nutrient medium recipe by statistical means. Batch kinetic data from studies in bioreactor using optimum concentration of variables obtained from statistical medium design was used for estimation of kinetic parameters of 1,3-PD production. Direct use of raw glycerol from biodiesel plant without any pre-treatment for 1,3-PD production using this strain investigated for the first time in this work gave results comparable to commercial glycerol. The parameter values obtained in this study would be used to develop a mathematical model for 1,3-PD to be used as a guide for designing various reactor operating strategies for further improving 1,3-PD production. An outline of protocol for model development has been discussed in the present work. 相似文献
18.
Lee A Newberg 《BMC bioinformatics》2009,10(1):212
Background
Hidden Markov models and hidden Boltzmann models are employed in computational biology and a variety of other scientific fields for a variety of analyses of sequential data. Whether the associated algorithms are used to compute an actual probability or, more generally, an odds ratio or some other score, a frequent requirement is that the error statistics of a given score be known. What is the chance that random data would achieve that score or better? What is the chance that a real signal would achieve a given score threshold? 相似文献19.
In this paper, we propose an analytically tractable model of protein folding based on one-dimensional general random walk. A second-order differential equation for the mean folding time of a single protein is constructed which can be used to derive the observed relationship between the folding rate constant and the number of native contacts. The parameters appearing in the model can be determined by fitting the theoretical prediction to the experimental result. In addition, taking into account the fact that the number of native contacts is almost proportional to the relative contact order, we can also explain the observed relationship between the folding rate constant and the relative contact order. 相似文献
20.
Kroon M 《Computer methods in biomechanics and biomedical engineering》2011,14(1):43-52
Smooth muscle exhibits an optimal length at which it is able to generate a maximum amount of force. In this study, the optimal length is assessed by use of a microstructurally and statistically based constitutive model for smooth muscle. The model is based on the sliding filament theory, and a modified version of Hill's mechanical model was adopted. It was conjectured, that a variation in the overlap in the actomyosin contractile units together with a statistical dispersion in the size of the dense bodies are responsible for the optimal length characteristics. The influence of contractile unit length, dense body size and dense body compliance was investigated, and the model was fully able to predict experimental data. The results indicate that the compliance of the dense bodies does not contribute significantly to the total compliance of the contractile apparatus. 相似文献