首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 960 毫秒
1.
H L Gordon  R L Somorjai 《Proteins》1992,14(2):249-264
We propose fuzzy clustering as a method to analyze molecular dynamics (MD) trajectories, especially of proteins and polypeptides. A fuzzy cluster analysis locates classes of similar three-dimensional conformations explored during a molecular dynamics simulation. The method can be readily applied to results from both equilibrium and nonequilibrium simulations, with clustering on either global or local structural parameters. The potential of this technique is illustrated by results from fuzzy cluster analyses of trajectories from MD simulations of various fragments of human parathyroid hormone (PTH). For large molecules, it is more efficient to analyze the clustering of root-mean-square distances between conformations comprising the trajectory. We found that the results of the clustering analysis were unambiguous, in terms of the optimal number of clusters of conformations, for the majority of the trajectories examined. The conformation closest to the cluster center can be chosen as being representative of the class of structures making up the cluster, and can be further analyzed, for example, in terms of its secondary structure. The CPU time used by the cluster analysis was negligible compared to the MD simulation time.  相似文献   

2.
The folding mechanism of the Villin headpiece (HP36) is studied by means of a novel approach which entails an initial coarse-grained Monte Carlo (MC) scheme followed by all-atom molecular dynamics (MD) simulations in explicit solvent. The MC evolution occurs in a simplified free-energy landscape and allows an efficient selection of marginally-compact structures which are taken as viable initial conformations for the MD. The coarse-grained MC structural representation is connected to the one with atomic resolution through a "fine-graining" reconstruction algorithm. This two-stage strategy is used to select and follow the dynamics of seven different unrelated conformations of HP36. In a notable case the MD trajectory rapidly evolves towards the folded state, yielding a typical root-mean-square deviation (RMSD) of the core region of only 2.4 A from the closest NMR model (the typical RMSD over the whole structure being 4.0 A). The analysis of the various MC-MD trajectories provides valuable insight into the details of the folding and mis-folding mechanisms and particularly about the delicate influence of local and nonlocal interactions in steering the folding process.  相似文献   

3.
In this article, we present a novel application of a quantum clustering (QC) technique to objectively cluster the conformations, sampled by molecular dynamics simulations performed on different ligand bound structures of the protein. We further portray each conformational population in terms of dynamically stable network parameters which beautifully capture the ligand induced variations in the ensemble in atomistic detail. The conformational populations thus identified by the QC method and verified by network parameters are evaluated for different ligand bound states of the protein pyrrolysyl-tRNA synthetase (DhPylRS) from D. hafniense. The ligand/environment induced re-distribution of protein conformational ensembles forms the basis for understanding several important biological phenomena such as allostery and enzyme catalysis. The atomistic level characterization of each population in the conformational ensemble in terms of the re-orchestrated networks of amino acids is a challenging problem, especially when the changes are minimal at the backbone level. Here we demonstrate that the QC method is sensitive to such subtle changes and is able to cluster MD snapshots which are similar at the side-chain interaction level. Although we have applied these methods on simulation trajectories of a modest time scale (20 ns each), we emphasize that our methodology provides a general approach towards an objective clustering of large-scale MD simulation data and may be applied to probe multistate equilibria at higher time scales, and to problems related to protein folding for any protein or protein-protein/RNA/DNA complex of interest with a known structure.  相似文献   

4.
Eric Johnson 《Proteins》2012,80(12):2645-2651
The separability between overall and internal motions is evaluated over multiple folding trajectories of the villin headpiece subdomain. The analysis, which relies on the Prompers‐Brüschweiler separability index, offers a potentially useful perspective on protein folding. The protein is considered folded in this study, not when it reaches some static target, but rather when it tumbles as a dynamically constrained object. The analysis also demonstrates how the separability index, when applied to protein folding simulations, can facilitate the analysis of NMR relaxation data. Proteins 2012;. © 2012 Wiley Periodicals, Inc.  相似文献   

5.
Clustering time-course gene expression data (gene trajectories) is an important step towards solving the complex problem of gene regulatory network modeling and discovery as it significantly reduces the dimensionality of the gene space required for analysis. Traditional clustering methods that perform hill-climbing from randomly initialized cluster centers are prone to produce inconsistent and sub-optimal cluster solutions over different runs. This paper introduces a novel method that hybridizes genetic algorithm (GA) and expectation maximization algorithms (EM) for clustering gene trajectories with the mixtures of multiple linear regression models (MLRs), with the objective of improving the global optimality and consistency of the clustering performance. The proposed method is applied to cluster the human fibroblasts and the yeast time-course gene expression data based on their trajectory similarities. It outperforms the standard EM method significantly in terms of both clustering accuracy and consistency. The biological implications of the improved clustering performance are demonstrated.  相似文献   

6.
Molecular dynamics (MD) simulation is an important tool for understanding bio-molecules in microscopic temporal/spatial scales. Besides the demand in improving simulation techniques to approach experimental scales, it becomes more and more crucial to develop robust methodology for precisely and objectively interpreting massive MD simulation data. In our previous work [J Phys Chem B 114, 10266 (2010)], the trajectory mapping (TM) method was presented to analyze simulation trajectories then to construct a kinetic transition network of metastable states. In this work, we further present a top-down implementation of TM to systematically detect complicate features of conformational space. We first look at longer MD trajectory pieces to get a coarse picture of transition network at larger time scale, and then we gradually cut the trajectory pieces in shorter for more details. A robust clustering algorithm is designed to more effectively identify the metastable states and transition events. We applied this TM method to detect the hierarchical structure in the conformational space of alanine-dodeca-peptide from microsecond to nanosecond time scales. The results show a downhill folding process of the peptide through multiple pathways. Even in this simple system, we found that single common-used order parameter is not sufficient either in distinguishing the metastable states or predicting the transition kinetics among these states.  相似文献   

7.
8.
9.
Two-stage folding of HP-35 from ab initio simulations   总被引:1,自引:0,他引:1  
  相似文献   

10.
MOTIVATION: Clustering technique is used to find groups of genes that show similar expression patterns under multiple experimental conditions. Nonetheless, the results obtained by cluster analysis are influenced by the existence of missing values that commonly arise in microarray experiments. Because a clustering method requires a complete data matrix as an input, previous studies have estimated the missing values using an imputation method in the preprocessing step of clustering. However, a common limitation of these conventional approaches is that once the estimates of missing values are fixed in the preprocessing step, they are not changed during subsequent processes of clustering; badly estimated missing values obtained in data preprocessing are likely to deteriorate the quality and reliability of clustering results. Thus, a new clustering method is required for improving missing values during iterative clustering process. RESULTS: We present a method for Clustering Incomplete data using Alternating Optimization (CIAO) in which a prior imputation method is not required. To reduce the influence of imputation in preprocessing, we take an alternative optimization approach to find better estimates during iterative clustering process. This method improves the estimates of missing values by exploiting the cluster information such as cluster centroids and all available non-missing values in each iteration. To test the performance of the CIAO, we applied the CIAO and conventional imputation-based clustering methods, e.g. k-means based on KNNimpute, for clustering two yeast incomplete data sets, and compared the clustering result of each method using the Saccharomyces Genome Database annotations. The clustering results of the CIAO method are more significantly relevant to the biological gene annotations than those of other methods, indicating its effectiveness and potential for clustering incomplete gene expression data. AVAILABILITY: The software was developed using Java language, and can be executed on the platforms that JVM (Java Virtual Machine) is running. It is available from the authors upon request.  相似文献   

11.
Principal component analysis for clustering gene expression data   总被引:15,自引:0,他引:15  
MOTIVATION: There is a great need to develop analytical methodology to analyze and to exploit the information contained in gene expression data. Because of the large number of genes and the complexity of biological networks, clustering is a useful exploratory technique for analysis of gene expression data. Other classical techniques, such as principal component analysis (PCA), have also been applied to analyze gene expression data. Using different data analysis techniques and different clustering algorithms to analyze the same data set can lead to very different conclusions. Our goal is to study the effectiveness of principal components (PCs) in capturing cluster structure. Specifically, using both real and synthetic gene expression data sets, we compared the quality of clusters obtained from the original data to the quality of clusters obtained after projecting onto subsets of the principal component axes. RESULTS: Our empirical study showed that clustering with the PCs instead of the original variables does not necessarily improve, and often degrades, cluster quality. In particular, the first few PCs (which contain most of the variation in the data) do not necessarily capture most of the cluster structure. We also showed that clustering with PCs has different impact on different algorithms and different similarity metrics. Overall, we would not recommend PCA before clustering except in special circumstances.  相似文献   

12.
13.
Besides the problem of searching for effective methods for data analysis there are some additional problems with handling data of high uncertainty. Uncertainty problems often arise in an analysis of ecological data, e.g. in the cluster analysis of ecological data. Conventional clustering methods based on Boolean logic ignore the continuous nature of ecological variables and the uncertainty of ecological data. That can result in misclassification or misinterpretation of the data structure. Clusters with fuzzy boundaries reflect better the continuous character of ecological features. But the problem is, that the common clustering methods (like the fuzzy c-means method) are only designed for treating crisp data, that means they provide a fuzzy partition only for crisp data (e.g. exact measurement data). This paper presents the extension and implementation of the method of fuzzy clustering of fuzzy data proposed by Yang and Liu [Yang, M.-S. and Liu, H-H, 1999. Fuzzy clustering procedures for conical fuzzy vector data. Fuzzy Sets and Systems, 106, 189-200.]. The imprecise data can be defined as multidimensional fuzzy sets with not sharply formed boundaries (in the form of the so-called conical fuzzy vectors). They can then be used for the fuzzy clustering together with crisp data. That can be particularly useful when information is not available about the variances which describe the accuracy of the data and probabilistic approaches are impossible. The method proposed by Yang has been extended and implemented for the Fuzzy Clustering System EcoFucs developed at the University of Kiel. As an example, the paper presents the fuzzy cluster analysis of chemicals according to their ecotoxicological properties. The uncertainty and imprecision of ecotoxicological data are very high because of the use of various data sources, various investigation tests and the difficulty of comparing these data. The implemented method can be very helpful in searching for an adequate partition of ecological data into clusters with similar properties.  相似文献   

14.
Small autonomously folding proteins are of interest as model systems to study protein folding, as the same molecule can be used for both experimental and computational approaches. The question remains as to how well these minimized peptide model systems represent larger native proteins. For example, is the core of a minimized protein tolerant to mutation like larger proteins are? Also, do minimized proteins use special strategies for specifying and stabilizing their folded structure? Here we examine these questions in the 35‐residue autonomously folding villin headpiece subdomain (VHP subdomain). Specifically, we focus on a cluster of three conserved phenylalanine (F) residues F47, F51, and F58, that form most of the hydrophobic core. These three residues are oriented such that they may provide stabilizing aromatic–aromatic interactions that could be critical for specifying the fold. Circular dichroism and 1D‐NMR spectroscopy show that point mutations that individually replace any of these three residues with leucine were destabilized, but retained the native VHP subdomain fold. In pair‐wise replacements, the double mutant that retains F58 can adopt the native fold, while the two double mutants that lack F58 cannot. The folding of the double mutant that retains F58 demonstrates that aromatic–aromatic interactions within the aromatic cluster are not essential for specifying the VHP subdomain fold. The ability of the VHP subdomain to tolerate mutations within its hydrophobic core indicates that the information specifying the three dimensional structure is distributed throughout the sequence, as observed in larger proteins. Thus, the VHP subdomain is a legitimate model for larger, native proteins.  相似文献   

15.
Despite its small size, chicken villin headpiece subdomain HP36 folds into the native structure with a stable hydrophobic core within several microseconds. How such a small protein keeps up its conformational stability and fast folding in solution is an important issue for understanding molecular mechanisms of protein folding. In this study, we performed multicanonical replica-exchange simulations of HP36 in explicit water, starting from a fully extended conformation. We observed at least five events of HP36 folding into nativelike conformations. The smallest backbone root mean-square deviation from the crystal structure was 1.1 Å. In the nativelike conformations, the stably formed hydrophobic core was fully dehydrated. Statistical analyses of the simulation trajectories show the following sequential events in folding of HP36: 1), Helix 3 is formed at the earliest stage; 2), the backbone and the side chains near the loop between Helices 2 and 3 take nativelike conformations; and 3), the side-chain packing at the hydrophobic core and the dehydration of the core side chains take place simultaneously at the later stage of folding. This sequence suggests that the initial folding nucleus is not necessarily the same as the hydrophobic core, consistent with a recent experimental ϕ-value analysis.  相似文献   

16.
17.
Clustering expressed sequence tags (ESTs) is a powerful strategy for gene identification, gene expression studies and identifying important genetic variations such as single nucleotide polymorphisms. To enable fast clustering of large-scale EST data, we developed PaCE (for Parallel Clustering of ESTs), a software program for EST clustering on parallel computers. In this paper, we report on the design and development of PaCE and its evaluation using Arabidopsis ESTs. The novel features of our approach include: (i) design of memory efficient algorithms to reduce the memory required to linear in the size of the input, (ii) a combination of algorithmic techniques to reduce the computational work without sacrificing the quality of clustering, and (iii) use of parallel processing to reduce run-time and facilitate clustering of larger data sets. Using a combination of these techniques, we report the clustering of 168 200 Arabidopsis ESTs in 15 min on an IBM xSeries cluster with 30 dual-processor nodes. We also clustered 327 632 rat ESTs in 47 min and 420 694 Triticum aestivum ESTs in 3 h and 15 min. We demonstrate the quality of our software using benchmark Arabidopsis EST data, and by comparing it with CAP3, a software widely used for EST assembly. Our software allows clustering of much larger EST data sets than is possible with current software. Because of its speed, it also facilitates multiple runs with different parameters, providing biologists a tool to better analyze EST sequence data. Using PaCE, we clustered EST data from 23 plant species and the results are available at the PlantGDB website.  相似文献   

18.
Assessing reliability of gene clusters from gene expression data   总被引:5,自引:0,他引:5  
The rapid development of microarray technologies has raised many challenging problems in experiment design and data analysis. Although many numerical algorithms have been successfully applied to analyze gene expression data, the effects of variations and uncertainties in measured gene expression levels across samples and experiments have been largely ignored in the literature. In this article, in the context of hierarchical clustering algorithms, we introduce a statistical resampling method to assess the reliability of gene clusters identified from any hierarchical clustering method. Using the clustering trees constructed from the resampled data, we can evaluate the confidence value for each node in the observed clustering tree. A majority-rule consensus tree can be obtained, showing clusters that only occur in a majority of the resampled trees. We illustrate our proposed methods with applications to two published data sets. Although the methods are discussed in the context of hierarchical clustering methods, they can be applied with other cluster-identification methods for gene expression data to assess the reliability of any gene cluster of interest. Electronic Publication  相似文献   

19.
Lei H  Su Y  Jin L  Duan Y 《Biophysical journal》2010,99(10):3374-3384
Protein folding is a complex multidimensional process that is difficult to illustrate by the traditional analyses based on one- or two-dimensional profiles. Analyses based on transition networks have become an alternative approach that has the potential to reveal detailed features of protein folding dynamics. However, due to the lack of successful reversible folding of proteins from conventional molecular-dynamics simulations, this approach has rarely been utilized. Here, we analyzed the folding network from several 10 μs conventional molecular-dynamics reversible folding trajectories of villin headpiece subdomain (HP35). The folding network revealed more complexity than the traditional two-dimensional map and demonstrated a variety of conformations in the unfolded state, intermediate states, and the native state. Of note, deep enthalpic traps at the unfolded state were observed on the folding landscape. Furthermore, in contrast to the clear separation of the native state and the primary intermediate state shown on the two-dimensional map, the two states were mingled on the folding network, and prevalent interstate transitions were observed between these two states. A more complete picture of the folding mechanism of HP35 emerged when the traditional and network analyses were considered together.  相似文献   

20.
MOTIVATION: Clustering protein structures is an important task in structural bioinformatics. De novo structure prediction, for example, often involves a clustering step for finding the best prediction. Other applications include assigning proteins to fold families and analyzing molecular dynamics trajectories. RESULTS: We present Pleiades, a novel approach to clustering protein structures with a rigorous mathematical underpinning. The method approximates clustering based on the root mean square deviation by first mapping structures to Gauss integral vectors--which were introduced by R?gen and co-workers--and subsequently performing K-means clustering. Conclusions: Compared to current methods, Pleiades dramatically improves on the time needed to perform clustering, and can cluster a significantly larger number of structures, while providing state-of-the-art results. The number of low energy structures generated in a typical folding study, which is in the order of 50,000 structures, can be clustered within seconds to minutes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号