首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
It has been shown that electropherograms of DNA sequences can be modeled with hidden Markov models. Basecalling, the procedure that determines the sequence of bases from the given eletropherogram, can then be performed using the Viterbi algorithm. A training step is required prior to basecalling in order to estimate the HMM parameters. In this paper, we propose a Bayesian approach which employs the Markov chain Monte Carlo (MCMC) method to perform basecalling. Such an approach not only allows one to naturally encode the prior biological knowledge into the basecalling algorithm, it also exploits both the training data and the basecalling data in estimating the HMM parameters, leading to more accurate estimates. Using the recently sequenced genome of the organism Legionella pneumophila we show that the MCMC basecaller outperforms the state-of-the-art basecalling algorithm in terms of total errors while requiring much less training than other proposed statistical basecallers.  相似文献   

2.
Cover1     
It has been shown that electropherograms of DNA sequences can be modeled with hidden Markov models. Basecalling, the procedure that determines the sequence of bases from the given electropherogram can then be performed using the Viterbi algorithm. A training step is required prior to basecalling in order to estimate the HMM parameters. In this paper, we propose a Bayesian approach which employs the Markov chain Monte Carlo (MCMC) method to perform basecalling. Such an approach not only allows one to naturally encode the prior biological knowledge into the basecalling algorithm, it also exploits both the training data and the basecalling data in estimating the HMM parameters, leading to more accurate estimates. Using the recently sequenced genome of the organism Legionella pneumophila, we show that the MCMC basecaller outperforms the state-of-the-art basecalling algorithm in terms of total errors while requiring much less training than other proposed statistical basecallers.  相似文献   

3.
Quantitative analysis of patch clamp data is widely based on stochastic models of single-channel kinetics. Membrane patches often contain more than one active channel of a given type, and it is usually assumed that these behave independently in order to interpret the record and infer individual channel properties. However, recent studies suggest there are significant channel interactions in some systems. We examine a model of dependence in a system of two identical channels, each modeled by a continuous-time Markov chain in which specified transition rates are dependent on the conductance state of the other channel, changing instantaneously when the other channel opens or closes. Each channel then has, e.g., a closed time density that is conditional on the other channel being open or closed, these being identical under independence. We relate the two densities by a convolution function that embodies information about, and serves to quantify, dependence in the closed class. Distributions of observable (superposition) sojourn times are given in terms of these conditional densities. The behavior of two channel systems based on two- and three-state Markov models is examined by simulation. Optimized fitting of simulated data using reasonable parameters values and sample size indicates that both positive and negative cooperativity can be distinguished from independence.  相似文献   

4.
Ionizing radiation damage to the genome of a non-cycling mammalian cell is analyzed using continuous time Markov chains. Immediate damage induced by the radiation is modeled as a batch Poisson arrival process of DNA double strand breaks (DSBs). Different kinds of radiation, for example gamma rays or alpha particles, have different batch probabilities. Enzymatic modulation of the immediate damage is modeled as a Markov process similar to the processes described by the master equation of stochastic chemical kinetics. An illustrative example is the restitution/complete exchange model, which postulates that radiation induced DSBs can subsequently either undergo enzymatically mediated repair (restitution) or can participate pairwise in chromosome exchanges, some of which make irremediable lesions such as dicentric chromosome aberrations. One may have rapid irradiation followed by enzymatic DSB processing or have prolonged irradiation with both DSB arrival and enzymatic DSB processing continuing throughout the irradiation period. A complete solution of the Markov chain is known for the case that the exchange rate constant is negligible so that no irremediable chromosome lesions are produced and DSBs are the only damage to the genome. Using PDEs for generating functions, a perturbation calculation is made assuming the exchange rate constant is small compared to the repair rate constant. Some non-perturbative results applicable to very prolonged irradiation are also obtained using matrix methods: Perron-Frobenius theory, variational methods and numerical approximations of eigenvalues. Applications to experimental results on expected values, variances and statistical distributions of DNA lesions are briefly outlined.Continuous time Markov chain models are the most systematic of those current radiation damage models which treat DSB-DSB interactions within the cell nucleus as homogeneous (e.g. ignore diffusion limitations). They contain most other homogeneous models as special cases, limiting cases or approximations. However, applying the continuous time Markov chain models to studying spatial dependence of DSB interactions, which is generally believed to be very important in some situations, presents difficulties.  相似文献   

5.
A survey is given of continuous-time Markov chain models for ionizing radiation damage to the genome of mammalian cells. In such models, immediate damage induced by the radiation is regarded as a batch-Poisson arrival process of DNA double-strand breaks (DSBs). Enzymatic modification of the immediate damage is modeled as a Markov process similar to those described by the master equation of stochastic chemical kinetics. An illustrative example is the restitution/complete-exchange model. The model postulates that, after being induced by radiation, DSBs subsequently either undergo enzymatically mediated restitution (repair) or participate pairwise in chromosome exchanges. Some of the exchanges make irremediable lesions such as dicentric chromosome aberrations. One may have rapid irradiation followed by enzymatic DSB processing or have prolonged irradiation with both DSB arrival and enzymatic DSB processing continuing throughout the irradiation period. Methods for analyzing the Markov chains include using an approximate model for expected values, the discrete-time Markov chain embedded at transitions, partial differential equations for generating functions, normal perturbation theory, singular perturbation theory with scaling, numerical computations, and certain matrix methods that combine Perron-Frobenius theory with variational estimates. Applications to experimental results on expected values, variances, and statistical distributions of DNA lesions are briefly outlined. Continuous-time Markov chains are the most systematic of those radiation damage models that treat DSB-DSB interactions within the cell nucleus as homogeneous (e.g., ignore diffusion limitations). They contain virtually all other relevant homogeneous models and semiempirical summaries as special cases, limiting cases, or approximations. However, the Markov models do not seem to be well suited for studying spatial dependence of DSB interactions, which is known to be important in some situations.  相似文献   

6.
M Slatkin  C A Muirhead 《Genetics》1999,152(2):775-781
An approximate method is developed to predict the number of strongly overdominant alleles in a population of which the size varies with time. The approximation relies on the strong-selection weak-mutation (SSWM) method introduced by J. H. Gillespie and leads to a Markov chain model that describes the number of common alleles in the population. The parameters of the transition matrix of the Markov chain depend in a simple way on the population size. For a population of constant size, the Markov chain leads to results that are nearly the same as those of N. Takahata. The Markov chain allows the prediction of the numbers of common alleles during and after a population bottleneck and the numbers of alleles surviving from before a bottleneck. This method is also adapted to modeling the case in which there are two classes of alleles, with one class causing a reduction in fitness relative to the other class. Very slight selection against one class can strongly affect the relative frequencies of the two classes and the relative ages of alleles in each class.  相似文献   

7.
A Bayesian approach to DNA sequence segmentation   总被引:3,自引:0,他引:3  
Boys RJ  Henderson DA 《Biometrics》2004,60(3):573-581
Many deoxyribonucleic acid (DNA) sequences display compositional heterogeneity in the form of segments of similar structure. This article describes a Bayesian method that identifies such segments by using a Markov chain governed by a hidden Markov model. Markov chain Monte Carlo (MCMC) techniques are employed to compute all posterior quantities of interest and, in particular, allow inferences to be made regarding the number of segment types and the order of Markov dependence in the DNA sequence. The method is applied to the segmentation of the bacteriophage lambda genome, a common benchmark sequence used for the comparison of statistical segmentation algorithms.  相似文献   

8.
9.
Summary The course of evolutionary change in DNA sequences has been modeled as a Markov process. The Markov process was represented by discrete time matrix methods. The parameters of the Markov transition matrices were estimated by least-squares direct-search optimization of the fit of the calculated divergence matrix to that observed for two aligned sequences. The Markov process corrected for multiple and parallel substitutions of bases at the same site. The method avoided the incorrect assumption of all previously described methods that the divergence between two present-day sequences is twice the divergence of either from the common and unknown ancestral sequence. The three previous methods were shown to be equivalent. The present method also avoided the undesirable assumptions that sequence composition has not changed with time and that the substitution rates in the two descendant lineages were the same. It permitted simultaneous estimation of ancestral sequence composition and, if applicable, of different substitution rates for the two descendant lineages, provided the total number of estimated parameters was less than 16. Properties of the Markov chain were discussed. It was proved for symmetric substitution matrices that all elements of the equilibrium divergence matrix equal 1/16, and that the total difference in the divergence matrix at epoch k equals the total change in the common substitution matrix at epoch 2k for all values of k. It was shown how to resolve an ambiguity in the assignment of two different substitution rates to the two descendant lineages when four or more similar sequences are available. The method was applied to the divergence matrix for codon site 3 for the mouse and rabbit beta-globins. This observed divergence matrix was significantly asymmetric and required at least two different substitution rates. This result could be achieved only by using different asymmetric substitution matrices for the two lineages.  相似文献   

10.
The general Markov plus invariable sites (GM+I) model of biological sequence evolution is a two-class model in which an unknown proportion of sites are not allowed to change, while the remainder undergo substitutions according to a Markov process on a tree. For statistical use it is important to know if the model is identifiable; can both the tree topology and the numerical parameters be determined from a joint distribution describing sequences only at the leaves of the tree? We establish that for generic parameters both the tree and all numerical parameter values can be recovered, up to clearly understood issues of 'label swapping'. The method of analysis is algebraic, using phylogenetic invariants to study the variety defined by the model. Simple rational formulas, expressed in terms of determinantal ratios, are found for recovering numerical parameters describing the invariable sites.  相似文献   

11.
Yang HC  Chao A 《Biometrics》2005,61(4):1010-1017
A bivariate Markov chain approach that includes both enduring (long-term) and ephemeral (short-term) behavioral effects in models for capture-recapture experiments is proposed. The capture history of each animal is modeled as a Markov chain with a bivariate state space with states determined by the capture status (capture/noncapture) and marking status (marked/unmarked). In this framework, a conditional-likelihood method is used to estimate the population size and the transition probabilities. The classical behavioral model that assumes only an enduring behavioral effect is included as a special case of the bivariate Markovian model. Another special case that assumes only an ephemeral behavioral effect reduces to a univariate Markov chain based on capture/noncapture status. The model with the ephemeral behavioral effect is extended to incorporate time effects; in this model, in contrast to extensions of the classical behavioral model, all parameters are identifiable. A data set is analyzed to illustrate the use of the Markovian models in interpreting animals' behavioral response. Simulation results are reported to examine the performance of the estimators.  相似文献   

12.
Summary .   We consider a set of independent Bernoulli trials with possibly different success probabilities that depend on covariate values. However, the available data consist only of aggregate numbers of successes among subsets of the trials along with all of the covariate values. We still wish to estimate the parameters of a modeled relationship between the covariates and the success probabilities, e.g., a logistic regression model. In this article, estimation of the parameters is made from a Bayesian perspective by using a Markov chain Monte Carlo algorithm based only on the available data. The proposed methodology is applied to both simulation studies and real data from a dose–response study of a toxic chemical, perchlorate.  相似文献   

13.
Dietzia maris NIT-D, a canthaxanthin producer, was isolated during routine screening of pigment-producing bacteria. Response surface methodology was applied for statistical designing of process parameters for biomass and canthaxanthin production. The effects of four process parameters (considered as independent variables), namely temperature (10-30?°C), pH (4.75-5.75), shaker speed (75-135?rpm) and percentage inoculum (0.5-2.5?%) on the biomass and canthaxanthin yield (considered as dependent variables) were studied. As much as 122?mg?L(-1) of canthaxanthin was obtained when Dietzia maris NIT-D was incubated for 120?h at 25?°C and 120?rpm, initial pH and percentage inoculum being 5.5 and 2?% respectively. The pigment yield is the highest reported till date, with Dietzia maris as the test organism. The maximum biomass yield was 7.39?g?L(-1) under optimized process parameters. The predicted values were also verified by validation experiments in 5-day fermentation. Different mathematical models were used to describe growth and production, considering the effect of glucose in batch mode. The kinetic constants were calculated by fitting the experimental data to the models. Cell growth was inhibited beyond a glucose concentration of 15?g?L(-1). Andrews' model gave the best fit with a R (2) value of 0.9993. During the exponential growth phase, the specific growth rate was found to remain fairly constant with respect to time. There was no inhibitory effect due to intracellular product accumulation for all concentrations of glucose. This observation is the first of its kind, as previous studies have reported that increasing accumulation of intracellular carotenoid exerts greater degree of inhibition on growth.  相似文献   

14.
Multistate models have been increasingly used to model natural history of many diseases as well as to characterize the follow-up of patients under varied clinical protocols. This modeling allows describing disease evolution, estimating the transition rates, and evaluating the therapy effects on progression. In many cases, the staging is defined on the basis of a discretization of the values of continuous markers (CD4 cell count for HIV application) that are subject to great variability due mainly to short time-scale noise (intraindividual variability) and measurement errors. This led us to formulate a Bayesian hierarchical model where, at a first level, a disease process (Markov model on the true states, which are unobserved) is introduced and, at a second level, the measurement process making the link between the true states and the observed marker values is modeled. This hierarchical formulation allows joint estimation of the parameters of both processes. Estimation of the quantities of interest is performed via stochastic algorithms of the family of Markov chain Monte Carlo methods. The flexibility of this approach is illustrated by analyzing the CD4 data on HIV patients of the Concorde clinical trial.  相似文献   

15.
This paper adopts data envelopment analysis method to calculate the regional energy efficiency from the perspective of total-factor energy efficiency. The Markov chain and spatial Markov chain are the common methods to test the club convergence of regional energy efficiency in China. Results indicate that the regional energy efficiency in China has been globally characterized by “club convergence” since 1999, and the energy efficiency transitions in China are closely connected with regional characteristics. A high level of energy efficiency has a positive influence on a region, whereas a low level of energy efficiency has a negative influence. This empirical analysis provides a spatial explanation to the “convergence clubs” of regional energy efficiency in China.  相似文献   

16.

Background

The increasing number of sequenced prokaryotic genomes contains a wealth of genomic data that needs to be effectively analysed. A set of statistical tools exists for such analysis, but their strengths and weaknesses have not been fully explored. The statistical methods we are concerned with here are mainly used to examine similarities between archaeal and bacterial DNA from different genomes. These methods compare observed genomic frequencies of fixed-sized oligonucleotides with expected values, which can be determined by genomic nucleotide content, smaller oligonucleotide frequencies, or be based on specific statistical distributions. Advantages with these statistical methods include measurements of phylogenetic relationship with relatively small pieces of DNA sampled from almost anywhere within genomes, detection of foreign/conserved DNA, and homology searches. Our aim was to explore the reliability and best suited applications for some popular methods, which include relative oligonucleotide frequencies (ROF), di- to hexanucleotide zero'th order Markov methods (ZOM) and 2.order Markov chain Method (MCM). Tests were performed on distant homology searches with large DNA sequences, detection of foreign/conserved DNA, and plasmid-host similarity comparisons. Additionally, the reliability of the methods was tested by comparing both real and random genomic DNA.

Results

Our findings show that the optimal method is context dependent. ROFs were best suited for distant homology searches, whilst the hexanucleotide ZOM and MCM measures were more reliable measures in terms of phylogeny. The dinucleotide ZOM method produced high correlation values when used to compare real genomes to an artificially constructed random genome with similar %GC, and should therefore be used with care. The tetranucleotide ZOM measure was a good measure to detect horizontally transferred regions, and when used to compare the phylogenetic relationships between plasmids and hosts, significant correlation (R 2 = 0.4) was found with genomic GC content and intra-chromosomal homogeneity.

Conclusion

The statistical methods examined are fast, easy to implement, and powerful for a number of different applications involving genomic sequence comparisons. However, none of the measures examined were superior in all tests, and therefore the choice of the statistical method should depend on the task at hand.  相似文献   

17.
A Gottschau 《Biometrics》1992,48(3):751-763
Time-homogeneous Markov chain models with state space [0, 1]k are useful in analysis of binary follow-up data on k individuals that interact. The number of parameters increases exponentially with k so more restrictive models are imperative for statistical inference. The hypothesis that the matrix of transition probabilities is invariant under permutation of individuals is discussed. It is shown that if individuals are exchangeable, then the process counting the number of individuals occupying a given state is a Markov chain. This reduction of data is sufficient if either at most a single individual may change state between two consecutive time points or if a state is absorbing. Similar results are obtained for exchangeability within two subgroups. Inference in the multivariate process reduces to a univariate problem if individuals are independent given the group's previous response. It is shown how conditional independence could be tested assuming exchangeability. The different hypotheses re examined in an analysis of the occurrence of bacteria in milk samples of Danish dairy cattle.  相似文献   

18.

Background

Genomic data are used in animal breeding to assist genetic evaluation. Several models to estimate genomic breeding values have been studied. In general, two approaches have been used. One approach estimates the marker effects first and then, genomic breeding values are obtained by summing marker effects. In the second approach, genomic breeding values are estimated directly using an equivalent model with a genomic relationship matrix. Allele coding is the method chosen to assign values to the regression coefficients in the statistical model. A common allele coding is zero for the homozygous genotype of the first allele, one for the heterozygote, and two for the homozygous genotype for the other allele. Another common allele coding changes these regression coefficients by subtracting a value from each marker such that the mean of regression coefficients is zero within each marker. We call this centered allele coding. This study considered effects of different allele coding methods on inference. Both marker-based and equivalent models were considered, and restricted maximum likelihood and Bayesian methods were used in inference.

Results

Theoretical derivations showed that parameter estimates and estimated marker effects in marker-based models are the same irrespective of the allele coding, provided that the model has a fixed general mean. For the equivalent models, the same results hold, even though different allele coding methods lead to different genomic relationship matrices. Calculated genomic breeding values are independent of allele coding when the estimate of the general mean is included into the values. Reliabilities of estimated genomic breeding values calculated using elements of the inverse of the coefficient matrix depend on the allele coding because different allele coding methods imply different models. Finally, allele coding affects the mixing of Markov chain Monte Carlo algorithms, with the centered coding being the best.

Conclusions

Different allele coding methods lead to the same inference in the marker-based and equivalent models when a fixed general mean is included in the model. However, reliabilities of genomic breeding values are affected by the allele coding method used. The centered coding has some numerical advantages when Markov chain Monte Carlo methods are used.  相似文献   

19.
试论生态系统与生物体之间的全息关系   总被引:4,自引:0,他引:4  
提出生态系统与生物体之间存在全息关系的观点,并认为生态系统与生物体都具有保护、支撑、运动、同化、呼吸、循环、排泄、繁殖和调控功能,生态系统的次生演替与生物体的再生修复过程存在着共同点。根据全息胚重演过程中的滞育性、可简化性,可以对生态演替的多方向、多途径问题作出新的解释。  相似文献   

20.
iTRAQ (isobaric Tags for Relative and Absolute Quantitation) is a technique that allows simultaneous quantitation of proteins in multiple samples. In this paper, we describe a Bayesian hierarchical model-based method to infer the relative protein expression levels and hence to identify differentially expressed proteins from iTRAQ data. Our model assumes that the measured peptide intensities are affected by both protein expression levels and peptide specific effects. The values of these two effects across experiments are modeled as random effects. The nonrandom missingness of peptide data is modeled with a logistic regression which relates the missingness probability for a peptide with the expression level of the protein that produces this peptide. We propose a Markov chain Monte Carlo method for the inference of model parameters, including the relative expression levels across samples. Our simulation results suggest that the estimates of relative protein expression levels based on the MCMC samples have smaller bias than those estimated from ANOVA models or fold changes. We apply our method to an iTRAQ dataset studying the roles of Caveolae for postnatal cardiovascular function.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号