首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
RNA structure formation is hierarchical and, therefore, secondary structure, the sum of canonical base-pairs, can generally be predicted without knowledge of the three-dimensional structure. Secondary structure prediction algorithms evolved from predicting a single, lowest free energy structure to their current state where statistics can be determined from the thermodynamic ensemble. This article reviews the free energy minimization technique and the salient revolutions in the dynamic programming algorithm methods for secondary structure prediction. Emphasis is placed on highlighting the recently developed method, which statistically samples structures from the complete Boltzmann ensemble.  相似文献   

2.
3.
H Resat  M Mezei 《Biophysical journal》1996,71(3):1179-1190
The grand canonical ensemble Monte Carlo molecular simulation method is used to investigate hydration patterns in the crystal hydrate structure of the dCpG/proflavine intercalated complex. The objective of this study is to show by example that the recently advocated grand canonical ensemble simulation is a computationally efficient method for determining the positions of the hydrating water molecules in protein and nucleic acid structures. A detailed molecular simulation convergence analysis and an analogous comparison of the theoretical results with experiments clearly show that the grand ensemble simulations can be far more advantageous than the comparable canonical ensemble simulations.  相似文献   

4.
Inferential structure determination uses Bayesian theory to combine experimental data with prior structural knowledge into a posterior probability distribution over protein conformational space. The posterior distribution encodes everything one can say objectively about the native structure in the light of the available data and additional prior assumptions and can be searched for structural representatives. Here an analogy is drawn between the posterior distribution and the canonical ensemble of statistical physics. A statistical mechanics analysis assesses the complexity of a structure calculation globally in terms of ensemble properties. Analogs of the free energy and density of states are introduced; partition functions evaluate the consistency of prior assumptions with data. Critical behavior is observed with dwindling restraint density, which impairs structure determination with too sparse data. However, prior distributions with improved realism ameliorate the situation by lowering the critical number of observations. An in-depth analysis of various experimentally accessible structural parameters and force field terms will facilitate a statistical approach to protein structure determination with sparse data that avoids bias as much as possible.  相似文献   

5.
新型统计方法和多源、多尺度空间信息数据的产生促进了物种空间分布模型的快速发展。不同的物种空间分布模型在生态学理论的运用以及前提假设上存在差异。选用不同的模型方法和输入数据会带来预测结果的不确定性。对比并集成多个物种空间分布模型,同时利用多组输入数据可降低预测的不确定性,提高物种分布模拟的精度。本文以中国特有种铁杉(Tsuga chinensis)为例,运用基于R语言开发的BioMod软件包对比9个物种空间分布模型对铁杉的模拟效果。最后以曲线下面积(ROC)为权重集成9个模型的模拟结果,产生和筛选最佳的铁杉潜在空间分布图。研究发现随机森林模型(RF)的模拟效果最好,其次是多元适应回归样条函数模型(MARS)和广义相加模型(GAM),模拟效果最差的是表面分布区分室模型(SRE)。模型集成结果显示,最适宜铁杉分布的区域集中在中国的西南及四川盆地周围,其次零星分散于华南和台湾部分地区。这一结果与前人对铁杉自然分布的描述和研究结果较为吻合。研究进一步表明,通过模型的集成能有效地降低由于单个模型所带来的模拟结果不确定性,从而提高模拟的精度和效果。  相似文献   

6.
Bio3D is a family of R packages for the analysis of biomolecular sequence, structure, and dynamics. Major functionality includes biomolecular database searching and retrieval, sequence and structure conservation analysis, ensemble normal mode analysis, protein structure and correlation network analysis, principal component, and related multivariate analysis methods. Here, we review recent package developments, including a new underlying segregation into separate packages for distinct analysis, and introduce a new method for structure analysis named ensemble difference distance matrix analysis (eDDM). The eDDM approach calculates and compares atomic distance matrices across large sets of homologous atomic structures to help identify the residue wise determinants underlying specific functional processes. An eDDM workflow is detailed along with an example application to a large protein family. As a new member of the Bio3D family, the Bio3D‐eddm package supports both experimental and theoretical simulation‐generated structures, is integrated with other methods for dissecting sequence‐structure–function relationships, and can be used in a highly automated and reproducible manner. Bio3D is distributed as an integrated set of platform independent open source R packages available from: http://thegrantlab.org/bio3d/ .  相似文献   

7.
Linking experiments with the atomistic resolution provided by molecular dynamics simulations can shed light on the structure and dynamics of protein-disordered states. The sampling limitations of classical molecular dynamics can be overcome using metadynamics, which is based on the introduction of a history-dependent bias on a small number of suitably chosen collective variables. Even if such bias distorts the probability distribution of the other degrees of freedom, the equilibrium Boltzmann distribution can be reconstructed using a recently developed reweighting algorithm. Quantitative comparison with experimental data is thus possible. Here we show the potential of this combined approach by characterizing the conformational ensemble explored by a 13-residue helix-forming peptide by means of a well-tempered metadynamics/parallel tempering approach and comparing the reconstructed nuclear magnetic resonance scalar couplings with experimental data.  相似文献   

8.
An RNA molecule, particularly a long-chain mRNA, may exist as a population of structures. Further more, multiple structures have been demonstrated to play important functional roles. Thus, a representation of the ensemble of probable structures is of interest. We present a statistical algorithm to sample rigorously and exactly from the Boltzmann ensemble of secondary structures. The forward step of the algorithm computes the equilibrium partition functions of RNA secondary structures with recent thermodynamic parameters. Using conditional probabilities computed with the partition functions in a recursive sampling process, the backward step of the algorithm quickly generates a statistically representative sample of structures. With cubic run time for the forward step, quadratic run time in the worst case for the sampling step, and quadratic storage, the algorithm is efficient for broad applicability. We demonstrate that, by classifying sampled structures, the algorithm enables a statistical delineation and representation of the Boltzmann ensemble. Applications of the algorithm show that alternative biological structures are revealed through sampling. Statistical sampling provides a means to estimate the probability of any structural motif, with or without constraints. For example, the algorithm enables probability profiling of single-stranded regions in RNA secondary structure. Probability profiling for specific loop types is also illustrated. By overlaying probability profiles, a mutual accessibility plot can be displayed for predicting RNA:RNA interactions. Boltzmann probability-weighted density of states and free energy distributions of sampled structures can be readily computed. We show that a sample of moderate size from the ensemble of an enormous number of possible structures is sufficient to guarantee statistical reproducibility in the estimates of typical sampling statistics. Our applications suggest that the sampling algorithm may be well suited to prediction of mRNA structure and target accessibility. The algorithm is applicable to the rational design of small interfering RNAs (siRNAs), antisense oligonucleotides, and trans-cleaving ribozymes in gene knock-down studies.  相似文献   

9.
MOTIVATION: Microarray experiments are expected to contribute significantly to the progress in cancer treatment by enabling a precise and early diagnosis. They create a need for class prediction tools, which can deal with a large number of highly correlated input variables, perform feature selection and provide class probability estimates that serve as a quantification of the predictive uncertainty. A very promising solution is to combine the two ensemble schemes bagging and boosting to a novel algorithm called BagBoosting. RESULTS: When bagging is used as a module in boosting, the resulting classifier consistently improves the predictive performance and the probability estimates of both bagging and boosting on real and simulated gene expression data. This quasi-guaranteed improvement can be obtained by simply making a bigger computing effort. The advantageous predictive potential is also confirmed by comparing BagBoosting to several established class prediction tools for microarray data. AVAILABILITY: Software for the modified boosting algorithms, for benchmark studies and for the simulation of microarray data are available as an R package under GNU public license at http://stat.ethz.ch/~dettling/bagboost.html.  相似文献   

10.
Mfuzz: a software package for soft clustering of microarray data   总被引:1,自引:0,他引:1  
For the analysis of microarray data, clustering techniques are frequently used. Most of such methods are based on hard clustering of data wherein one gene (or sample) is assigned to exactly one cluster. Hard clustering, however, suffers from several drawbacks such as sensitivity to noise and information loss. In contrast, soft clustering methods can assign a gene to several clusters. They can overcome shortcomings of conventional hard clustering techniques and offer further advantages. Thus, we constructed an R package termed Mfuzz implementing soft clustering tools for microarray data analysis. The additional package Mfuzzgui provides a convenient TclTk based graphical user interface. AVAILABILITY: The R package Mfuzz and Mfuzzgui are available at http://itb1.biologie.hu-berlin.de/~futschik/software/R/Mfuzz/index.html. Their distribution is subject to GPL version 2 license.  相似文献   

11.
Laughton CA  Orozco M  Vranken W 《Proteins》2009,75(1):206-216
NMR structures are typically deposited in databases such as the PDB in the form of an ensemble of structures. Generally, each of the models in such an ensemble satisfies the experimental data and is equally valid. No unique solution can be calculated because the experimental NMR data is insufficient, in part because it reflects the conformational variability and dynamical behavior of the molecule in solution. Even for relatively rigid molecules, the limited number of structures that are typically deposited cannot completely encompass the structural diversity allowed by the observed NMR data, but they can be chosen to try and maximize its representation. We describe here the adaptation and application of techniques more commonly used to examine large ensembles from molecular dynamics simulations, to the analysis of NMR ensembles. The approach, which is based on principal component analysis, we call COCO ("Complementary Coordinates"). The COCO approach analyses the distribution of an NMR ensemble in conformational space, and generates a new ensemble that fills "gaps" in the distribution. The method is very rapid, and analysis of a 25-member ensemble and generation of a new 25 member ensemble typically takes 1-2 min on a conventional workstation. Applied to the 545 structures in the RECOORD database, we find that COCO generates new ensembles that are as structurally diverse-both from each other and from the original ensemble-as are the structures within the original ensemble. The COCO approach does not explicitly take into account the NMR restraint data, yet in tests on selected structures from the RECOORD database, the COCO ensembles are frequently good matches to this data, and certainly are structures that can be rapidly refined against the restraints to yield high-quality, novel solutions. COCO should therefore be a useful aid in NMR structure refinement and in other situations where a richer representation of conformational variability is desired-for example in docking studies. COCO is freely accessible via the website www.ccpb.ac.uk/COCO.  相似文献   

12.
In the integrative analyses of omics data, it is often of interest to extract data representation from one data type that best reflect its relations with another data type. This task is traditionally fulfilled by linear methods such as canonical correlation analysis (CCA) and partial least squares (PLS). However, information contained in one data type pertaining to the other data type may be complex and in nonlinear form. Deep learning provides a convenient alternative to extract low-dimensional nonlinear data embedding. In addition, the deep learning setup can naturally incorporate the effects of clinical confounding factors into the integrative analysis. Here we report a deep learning setup, named Autoencoder-based Integrative Multi-omics data Embedding (AIME), to extract data representation for omics data integrative analysis. The method can adjust for confounder variables, achieve informative data embedding, rank features in terms of their contributions, and find pairs of features from the two data types that are related to each other through the data embedding. In simulation studies, the method was highly effective in the extraction of major contributing features between data types. Using two real microRNA-gene expression datasets, one with confounder variables and one without, we show that AIME excluded the influence of confounders, and extracted biologically plausible novel information. The R package based on Keras and the TensorFlow backend is available at https://github.com/tianwei-yu/AIME.  相似文献   

13.
An R package for analysis of whole-genome association studies   总被引:3,自引:0,他引:3  
OBJECTIVE: To provide data classes and methods to facilitate the analysis of whole genome association studies in the R language for statistical computing. METHODS: We have implemented data classes in which each genotype call is stored as a single byte. At this density, data for single chromosomes derived from large studies and new high-throughput gene chip platforms can be handled in memory. We use the object-oriented programming model introduced with version 4 of the S-plus package, usually termed 'S4 methods'. RESULTS: At the current state of development the package only supports population-based studies, although we would hope to provide support for family-based studies soon. Both quantitative and qualitative phenotypes may be analysed. Flexible association testing functions are provided which can carry out single SNP tests which control for potential confounding by quantitative and qualitative covariates. Tests involving several SNPs taken together as 'tags' are also supported. Efficient calculation of pair-wise linkage disequilibrium measures is implemented and data input functions include a function which can download data directly from the international HapMap project website.  相似文献   

14.
We here present a dynamic programming algorithm which is capable of calculating arbitrary moments of the Boltzmann distribution for RNA secondary structures. We have implemented the algorithm in a program called RNA-VARIANCE and investigate the difference between the Boltzmann distribution of biological and random RNA sequences. We find that the minimum free energy structure of biological sequences has a higher probability in the Boltzmann distribution than random sequences. Moreover, we show that the free energies of biological sequences have a smaller variance than random sequences and that the minimum free energy of biological sequences is closer to the expected free energy of the rest of the structures than that of random sequences. These results suggest that biologically functional RNA sequences not only require a thermodynamically stable minimum free energy structure, but also an ensemble of structures whose free energies are close to the minimum free energy.  相似文献   

15.
beadarray: R classes and methods for Illumina bead-based data   总被引:2,自引:0,他引:2  
The R/Bioconductor package beadarray allows raw data from Illumina experiments to be read and stored in convenient R classes. Users are free to choose between various methods of image processing, background correction and normalization in their analysis rather than using the defaults in Illumina's; proprietary software. The package also allows quality assessment to be carried out on the raw data. The data can then be summarized and stored in a format which can be used by other R/Bioconductor packages to perform downstream analyses. Summarized data processed by Illumina's; BeadStudio software can also be read and analysed in the same manner. Availability: The beadarray package is available from the Bioconductor web page at www.bioconductor.org. A user's guide and example data sets are provided with the package.  相似文献   

16.
SUMMARY: OTUbase is an R package designed to facilitate the analysis of operational taxonomic unit (OTU) data and sequence classification (taxonomic) data. Currently there are programs that will cluster sequence data into OTUs and/or classify sequence data into known taxonomies. However, there is a need for software that can take the summarized output of these programs and organize it into easily accessed and manipulated formats. OTUbase provides this structure and organization within R, to allow researchers to easily manipulate the data with the rich library of R packages currently available for additional analysis. AVAILABILITY: OTUbase is an R package available through Bioconductor. It can be found at http://www.bioconductor.org/packages/release/bioc/html/OTUbase.html.  相似文献   

17.
ModEco: an integrated software package for ecological niche modeling   总被引:2,自引:0,他引:2  
Qinghua Guo  Yu Liu 《Ecography》2010,33(4):637-642
ModEco is a software package for ecological niche modeling. It integrates a range of niche modeling methods within a geographical information system. ModEco provides a user friendly platform that enables users to explore, analyze, and model species distribution data with relative ease. ModEco has several unique features: 1) it deals with different types of ecological observation data, such as presence and absence data, presence‐only data, and abundance data; 2) it provides a range of models when dealing with presence‐only data, such as presence‐only models, pseudo‐absence models, background vs presence data models, and ensemble models; and 3) it includes relatively comprehensive tools for data visualization, feature selection, and accuracy assessment.  相似文献   

18.
Virtual and solution conformations of oligosaccharides   总被引:3,自引:0,他引:3  
D A Cumming  J P Carver 《Biochemistry》1987,26(21):6664-6676
The possibility that observed nuclear Overhauser enhancements and bulk longitudinal relaxation times, parameters measured by 1H NMR and often employed in determining the preferred solution conformation of biologically important molecules, are the result of averaging over many conformational states is quantitatively evaluated. Of particular interest was to ascertain whether certain 1H NMR determined conformations are "virtual" in nature; i.e., the fraction of the population of molecules actually found at any time within the subset of conformational space defined as the "solution conformation" is vanishingly small. A statistical mechanics approach was utilized to calculate an ensemble average relaxation matrix from which (NOE)'s and (T1)'s are calculated. Model glycosidic linkages in four oligosaccharides were studied. The solution conformation at any glycosidic linkage is properly represented by a normalized, Boltzmann distribution of conformers generated from an appropriate potential energy surface. The nature of the resultant population distributions is such that 50% of the molecular population is found within 1% of available microstates, while 99% of the molecular population occupies about 10% of the ensemble microstates, a number roughly equal to that sterically allowed. From this analysis we conclude that in many cases quantitative interpretation of NMR relaxation data, which attempts to define a single set of allowable torsion angle values consistent with the observed data, will lead to solution conformations that are either virtual or reflect torsion angle values possessed by a minority of the molecular population. On the other hand, calculation of ensemble average NMR relaxation data yields values in agreement with experimental results. Observed values of NMR relaxation data are the result of the complex interdependence of the population distribution and NOE (or T1) surfaces in conformational space. In conformational analyses, NMR data can therefore be used to test different population distributions calculated from empirical potential energy functions.  相似文献   

19.
MOTIVATION: Conventional Monte Carlo and molecular dynamics simulations of proteins in the canonical ensemble are of little use, because they tend to get trapped in states of energy local minima at low temperatures. One way to surmount this difficulty is to use a non-Boltzmann sampling method in which conformations are sampled upon a general weighting function instead of the conventional Boltzmann weighting function. The multiensemble sampling (MES) method is a non-Boltzmann sampling method that was originally developed to estimate free energy differences between systems with different potential energies and/or at different thermodynamic states. The method has not yet been applied to studies of complex molecular systems such as proteins. RESULTS: MES Monte Carlo simulations of small proteins have been carried out using a united-residue force field. The proteins at several temperatures from the unfolded to the folded states were simulated in a single MC run at a time and their equilibrium thermodynamic properties were calculated correctly. The distributions of sampled conformations clearly indicate that, when going through states of energy local minima, the MES simulation did not get trapped in them but escaped from them so quickly that all the relevant parts of conformation space could be sampled properly. A two-step folding process consisting of a collapse transition followed by a folding transition is observed. This study demonstrates that the use of MES alleviates the multiple-minima problem greatly. AVAILABILITY: Available on request from the authors.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号