首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Inference for Dirichlet process hierarchical models is typicallyperformed using Markov chain Monte Carlo methods, which canbe roughly categorized into marginal and conditional methods.The former integrate out analytically the infinite-dimensionalcomponent of the hierarchical model and sample from the marginaldistribution of the remaining variables using the Gibbs sampler.Conditional methods impute the Dirichlet process and updateit as a component of the Gibbs sampler. Since this requiresimputation of an infinite-dimensional process, implementationof the conditional method has relied on finite approximations.In this paper, we show how to avoid such approximations by designingtwo novel Markov chain Monte Carlo algorithms which sample fromthe exact posterior distribution of quantities of interest.The approximations are avoided by the new technique of retrospectivesampling. We also show how the algorithms can obtain samplesfrom functionals of the Dirichlet process. The marginal andthe conditional methods are compared and a careful simulationstudy is included, which involves a non-conjugate model, differentdatasets and prior specifications.  相似文献   

2.
Single copies of four different phenolate ion mutants of the green fluorescent protein (GFP) exhibit a complex blinking and fluctuating behavior, a phenomenon that is hidden in measurements on large ensembles. Both total internal reflection microscopy and scanning confocal microscopy can be used to study the blinking dynamics, and autocorrelation analysis yields histograms of the correlation times for many individual molecules. While the total internal reflection method can follow several single molecules simultaneously, the confocal method offers higher time resolution at the expense of parallelism. We compare and contrast the two methods in terms of the ability to follow the complex dynamics of this system.  相似文献   

3.
Statistical methods have been developed for finding local patterns, also called motifs, in multiple protein sequences. The aligned segments may imply functional or structural core regions. However, the existing methods often have difficulties in aligning multiple proteins when sequence residue identities are low (e.g., less than 25%). In this article, we develop a Bayesian model and Markov chain Monte Carlo (MCMC) methods for identifying subtle motifs in protein sequences. Specifically, a motif is defined not only in terms of specific sites characterized by amino acid frequency vectors, but also as a combination of secondary characteristics such as hydrophobicity, polarity, etc. Markov chain Monte Carlo methods are proposed to search for a motif pattern with high posterior probability under the new model. A special MCMC algorithm is developed, involving transitions between state spaces of different dimensions. The proposed methods were supported by a simulated study. It was then tested by two real datasets, including a group of helix-turn-helix proteins, and one set from the CATH Protein Structure Classification Database. Statistical comparisons showed that the new approach worked better than a typical Gibbs sampling approach which is based only on an amino acid model.  相似文献   

4.

Background  

The subcellular location of a protein is closely related to its function. It would be worthwhile to develop a method to predict the subcellular location for a given protein when only the amino acid sequence of the protein is known. Although many efforts have been made to predict subcellular location from sequence information only, there is the need for further research to improve the accuracy of prediction.  相似文献   

5.
6.
Statistics on Markov chains are widely used for the study of patterns in biological sequences. Statistics on these models can be done through several approaches. Central limit theorem (CLT) producing Gaussian approximations are one of the most popular ones. Unfortunately, in order to find a pattern of interest, these methods have to deal with tail distribution events where CLT is especially bad. In this paper, we propose a new approach based on the large deviations theory to assess pattern statistics. We first recall theoretical results for empiric mean (level 1) as well as empiric distribution (level 2) large deviations on Markov chains. Then, we present the applications of these results focusing on numerical issues. LD-SPatt is the name of GPL software implementing these algorithms. We compare this approach to several existing ones in terms of complexity and reliability and show that the large deviations are more reliable than the Gaussian approximations in absolute values as well as in terms of ranking and are at least as reliable as compound Poisson approximations. We then finally discuss some further possible improvements and applications of this new method.  相似文献   

7.
HCPM is a tool for clustering protein structures from comparative modeling, ab initio structure prediction, etc. A hierarchical clustering algorithm is designed and tested, and a heuristic is provided for an optimal cluster selection. The method has been successfully tested during the CASP6 experiment.  相似文献   

8.
An age-structured population is considered in which the birth and death rates of an individual of age a is a function of the density of individuals older and/or younger than a. An existence/uniqueness theorem is proved for the McKendrick equation that governs the dynamics of the age distribution function. This proof shows how a decoupled ordinary differential equation for the total population size can be derived. This result makes a study of the population's asymptotic dynamics (indeed, often its global asymptotic dynamics) mathematically tractable. Several applications to models for intra-specific competition and predation are given.  相似文献   

9.
10.
11.
We propose a model that explains the hierarchical organization of proteins in fold families. The model, which is based on the evolutionary selection of proteins by their native state stability, reproduces patterns of amino acids conserved across protein families. Due to its dynamic nature, the model sheds light on the evolutionary time-scales. By studying the relaxation of the correlation function between consecutive mutations at a given position in proteins, we observe separation of the evolutionary time-scales: at short time intervals families of proteins with similar sequences and structures are formed, while at long time intervals the families of structurally similar proteins that have low sequence similarity are formed. We discuss the evolutionary implications of our model. We provide a "profile" solution to our model and find agreement between predicted patterns of conserved amino acids and those actually observed in nature.  相似文献   

12.
We report simplified methods for large scale enzymatic synthesis of oligoribonucleotides using polynucleotide phosphorylase. The main features of the method are use of RPC-5 chromatography, including chromatography at two pH values to deal with the problem of primer phosphorolysis, rapid dialysis for large scale desalting, simplified methods for enzyme removal, and high resolution 1H and 31P NMR for product identification and demonstration of purity. The capacity of the method is adequate to allow beginning with grams of material in the first polymerization step, so that product yields of several milligrams, sufficient for many physical studies, are possible after as many as three separate polymerization reactions.  相似文献   

13.
MOTIVATION: Short sequence patterns frequently define regions of biological interest (binding sites, immune epitopes, primers, etc.), yet a large fraction of this information exists only within the scientific literature and is thus difficult to locate via conventional means (e.g. keyword queries or manual searches). We describe herein a system to accurately identify and classify sequence patterns from within large corpora using an n-gram Markov model (MM). RESULTS: As expected, on test sets we found that identification of sequences with limited alphabets and/or regular structures such as nucleic acids (non-ambiguous) and peptide abbreviations (3-letter) was highly accurate, whereas classification of symbolic (1-letter) peptide strings with more complex alphabets was more problematic. The MM was used to analyze two very large, sequence-containing corpora: over 7.75 million Medline abstracts and 9000 full-text articles from Journal of Virology. Performance was benchmarked by comparing the results with Journal of Virology entries in two existing manually curated databases: VirOligo and the HLA Ligand Database. Performance estimates were 98 +/- 2% precision/84% recall for primer identification and classification and 67 +/- 6% precision/85% recall for peptide epitopes. We also find a dramatic difference between the amounts of sequence-related data reported in abstracts versus full text. Our results suggest that automated extraction and classification of sequence elements is a promising, low-cost means of sequence database curation and annotation. AVAILABILITY: MM routine and datasets are available upon request.  相似文献   

14.
复杂生态系统分形层次结构的统计动力学分析   总被引:2,自引:0,他引:2  
王辉  柴立和 《应用生态学报》2007,18(7):1560-1567
生态系统具有异质性、非线性、多层次性等复杂特性.针对生态系统的复杂性,从统计动力学的视角出发,从生态组元相互作用的微观动力学角度,对生态系统分形层次结构的起源及其形成过程的动力学机理进行了初步探讨,并分析了生态系统分形维数的影响因素及机理.对理论结果与实际例子进行了比较,并讨论了在实践中的可能应用.  相似文献   

15.
16.
Recent advances in ab initio direct methods have enabled the solution of crystal structures of small proteins from native X-ray data alone, that is, without the use of fragments of known structure or the need to prepare heavy-atom or selenomethionine derivatives, provided that the data are available to atomic resolution. These methods are also proving to be useful for locating the selenium atoms or other anomalous scatterers in the multiple wavelength anomalous diffraction phasing of larger proteins at lower resolution.  相似文献   

17.
This review outlines recent advances in the application of molecular biological techniques to the study of protein structure and function. The chapter is divided into four main sections: methods for oligonucleotide-directed mutagenesis; mutational strategies for identifying functional residues and domains; systems for expression; and future developments. Few new methods were reported in 1990; however, a number of the papers that appeared represent refinements of previously reported strategies. This review is also published in Current Opinion in Structural Biology 1991, 1:605-610.  相似文献   

18.
A new method, weighted-ensemble Brownian dynamics, is proposed for the simulation of protein-association reactions and other events whose frequencies of outcomes are constricted by free energy barriers. The method features a weighted ensemble of trajectories in configuration space with energy levels dictating the proper correspondence between "particles" and probability. Instead of waiting a very long time for an unlikely event to occur, the probability packets are split, and small packets of probability are allowed to diffuse almost immediately into regions of configuration space that are less likely to be sampled. The method has been applied to the Northrup and Erickson (1992) model of docking-type diffusion-limited reactions and yields reaction rate constants in agreement with those obtained by direct Brownian simulation, but at a fraction of the CPU time (10(-4) to 10(-3), depending on the model). Because the method is essentially a variant of standard Brownian dynamics algorithms, it is anticipated that weighted-ensemble Brownian dynamics, in conjunction with biophysical force models, can be applied to a large class of association reactions of interest to the biophysics community.  相似文献   

19.
Markov chain Monte Carlo methods for switching diffusion models   总被引:1,自引:0,他引:1  
  相似文献   

20.
Comparison of methods for searching protein sequence databases.   总被引:10,自引:2,他引:10       下载免费PDF全文
We have compared commonly used sequence comparison algorithms, scoring matrices, and gap penalties using a method that identifies statistically significant differences in performance. Search sensitivity with either the Smith-Waterman algorithm or FASTA is significantly improved by using modern scoring matrices, such as BLOSUM45-55, and optimized gap penalties instead of the conventional PAM250 matrix. More dramatic improvement can be obtained by scaling similarity scores by the logarithm of the length of the library sequence (In()-scaling). With the best modern scoring matrix (BLOSUM55 or JO93) and optimal gap penalties (-12 for the first residue in the gap and -2 for additional residues), Smith-Waterman and FASTA performed significantly better than BLASTP. With In()-scaling and optimal scoring matrices (BLOSUM45 or Gonnet92) and gap penalties (-12, -1), the rigorous Smith-Waterman algorithm performs better than either BLASTP and FASTA, although with the Gonnet92 matrix the difference with FASTA was not significant. Ln()-scaling performed better than normalization based on other simple functions of library sequence length. Ln()-scaling also performed better than scores based on normalized variance, but the differences were not statistically significant for the BLOSUM50 and Gonnet92 matrices. Optimal scoring matrices and gap penalties are reported for Smith-Waterman and FASTA, using conventional or In()-scaled similarity scores. Searches with no penalty for gap extension, or no penalty for gap opening, or an infinite penalty for gaps performed significantly worse than the best methods. Differences in performance between FASTA and Smith-Waterman were not significant when partial query sequences were used. However, the best performance with complete query sequences was obtained with the Smith-Waterman algorithm and In()-scaling.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号