首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Mojie Duan  Minghai Li  Li Han  Shuanghong Huo 《Proteins》2014,82(10):2585-2596
Dimensionality reduction is widely used in searching for the intrinsic reaction coordinates for protein conformational changes. We find the dimensionality?reduction methods using the pairwise root?mean?square deviation (RMSD) as the local distance metric face a challenge. We use Isomap as an example to illustrate the problem. We believe that there is an implied assumption for the dimensionality‐reduction approaches that aim to preserve the geometric relations between the objects: both the original space and the reduced space have the same kind of geometry, such as Euclidean geometry vs. Euclidean geometry or spherical geometry vs. spherical geometry. When the protein free energy landscape is mapped onto a 2D plane or 3D space, the reduced space is Euclidean, thus the original space should also be Euclidean. For a protein with N atoms, its conformation space is a subset of the 3N‐dimensional Euclidean space R3N. We formally define the protein conformation space as the quotient space of R3N by the equivalence relation of rigid motions. Whether the quotient space is Euclidean or not depends on how it is parameterized. When the pairwise RMSD is employed as the local distance metric, implicit representations are used for the protein conformation space, leading to no direct correspondence to a Euclidean set. We have demonstrated that an explicit Euclidean‐based representation of protein conformation space and the local distance metric associated to it improve the quality of dimensionality reduction in the tetra‐peptide and β‐hairpin systems. Proteins 2014; 82:2585–2596. © 2014 Wiley Periodicals, Inc.  相似文献   

2.

Background  

Nonlinear methods provide a direct way of estimating complexity of one-dimensional sampled signals through calculation of Higuchi's fractal dimension (1<FD<2). In most cases the signal is treated as being characterized by one value of FD and consequently analyzed as one epoch or, if divided into more epochs, often only mean and standard deviation of epoch FD are calculated. If its complexity variation (or running fractal dimension), FD(t), is to be extracted, a moving window (epoch) approach is needed. However, due to low-pass filtering properties of moving windows, short epochs are preferred. Since Higuchi's method is based on consecutive reduction of signal sampling frequency, it is not suitable for estimating FD of very short epochs (N < 100 samples).  相似文献   

3.
Cover is the most frequently used measure of abundance in vegetation surveys of grasslands, and various qualitative and semi-quantitative methods have been developed for visual estimation of this metric. Field survey is usually made with a point-grid plate. The frequency distributions of cover derived from point-grid counts follow a beta distribution. Combining point-grid counts from a field survey and the beta distribution for a statistical analysis, we developed an effort-saving cover-measurement method. Cover is measured with a transparent plastic plate on which, for example, 10 × 10 = 100 points are arranged in a lattice with 1-cm grid spacing (thus, one point count represents 1 cm2 of cover). N quadrats are set out at randomly dispersed sites in a grassland, and, in each, the plastic plate is used for making counts. The number of grid points located above a given species is counted in every quadrat until the number of counted points reaches a given value c, which is determined in advance. If the number of counted points reaches c in a quadrat, the count is stopped and the quadrat is classified in the category “>c”. In quadrats where c is not attained, full point counts above the species bodies are made. Let g be the number of observed quadrats whose cover is ≤c. Using these g cover measurements and the number of quadrats (N − g) with cover >c, we can quantitatively estimate cover for each species and the spatial pattern index value based on the maximum likelihood method. In trial counts using this method, the time savings varied between 5% and 41%, depending on the shape of the cover frequency distribution. The mean cover value estimates agreed well with conventional measures without a stopping point (i.e., based on full counts of all points in each quadrat).  相似文献   

4.
1. Aquatic plants are a key component of spatial heterogeneity in a waterscape, contributing to habitat complexity and helping determine diversity at various spatial scales. Theoretically, the more complex a habitat, the higher the number of species present. 2. Few empirical data are available to test the hypothesis that complexity increases diversity in aquatic communities (e.g. Jeffries, 1993 ). Fractal dimension has become widely applied in ecology as a tool to quantify the degree of complexity at different scales. 3. We investigated the hypothesis that complexity in vegetated habitat in two tropical lagoons mediates littoral invertebrate number of taxa (S) and density (N). Aquatic macrophyte habitat complexity was defined using a fractal dimension and a gradient of natural plant complexities. We also considered plant area, plant identity and, only for S, invertebrate density as additional explanatory variables. 4. Our results indicate that habitat complexity provided by the different architectures of aquatic plants, significantly affects both S and total N. However, number of individuals (as a result of passive sampling) also helps to account for S and, together with plant identity and area, contributes to the determination of N. We suggest that measurements of structural complexity, measured through fractal geometry, should be included in studies aimed at explaining attributes of attached invertebrates at small (e.g. plant or leaf) scales.  相似文献   

5.
In this paper, a number of existing and novel techniques are considered for ordering cloned extracts from the genome of an organism based on fingerprinting data. A metric is defined for comparing the quality of the clone order for each technique. Simulated annealing is used in combination with several different objective functions. Empirical results with many simulated data sets for which the correct solution is known indicate that a simple greedy algorithm with some subsequent stochastic shuffling provides the best solution. Other techniques that attempt to weight comparisons between nonadjacent clones bias the ordering and give worse results. We show that this finding is not surprising since without detailed attempts to reconcile the data into a detailed map, only approximate maps can be obtained. MakingN2pieces of data from measurements ofNclones cannot improve the situation.  相似文献   

6.
Myelodysplastic syndromes (MDS) are a group of heterogeneous myeloid clonal disorders characterized by ineffective hematopoiesis. Accumulating evidence has shown that macrophages (MΦs) are important components in the regulation of tumor progression and hematopoietic stem cells (HSCs). However, the roles of bone marrow (BM) MΦs in regulating normal and malignant hematopoiesis in different clinical stages of MDS are largely unknown. Age-paired patients with lower-risk MDS (N = 15), higher-risk MDS (N = 15), de novo acute myeloid leukemia (AML) (N = 15), and healthy donors (HDs) (N = 15) were enrolled. Flow cytometry analysis showed increased pro-inflammatory monocyte subsets and a decreased classically activated (M1) MΦs/alternatively activated (M2) MΦs ratio in the BM of patients with higher-risk MDS compared to lower-risk MDS. BM MФs from patients with higher-risk MDS and AML showed impaired phagocytosis activity but increased migration compared with lower-risk MDS group. AML BM MΦs showed markedly higher S100A8/A9 levels than lower-risk MDS BM MΦs. More importantly, coculture experiments suggested that the HSC supporting abilities of BM MΦs from patients with higher-risk MDS decreased, whereas the malignant cell supporting abilities increased compared with lower-risk MDS. Gene Ontology enrichment comparing BM MΦs from lower-risk MDS and higher-risk MDS for genes was involved in hematopoiesis- and immunity-related pathways. Our results suggest that BM MΦs are involved in ineffective hematopoiesis in patients with MDS, which indicates that repairing aberrant BM MΦs may represent a promising therapeutic approach for patients with MDS.  相似文献   

7.
High dimensional data increase the dimension of space and consequently the computational complexity and result in lower generalization. From these types of classification problems microarray data classification can be mentioned. Microarrays contain genetic and biological data which can be used to diagnose diseases including various types of cancers and tumors. Having intractable dimensions, dimension reduction process is necessary on these data. The main goal of this paper is to provide a method for dimension reduction and classification of genetic data sets. The proposed approach includes different stages. In the first stage, several feature ranking methods are fused for enhancing the robustness and stability of feature selection process. Wrapper method is combined with the proposed hybrid ranking method to embed the interaction between genes. Afterwards, the classification process is applied using support vector machine. Before feeding the data to the SVM classifier the problem of imbalance classes of data in the training phase should be overcame. The experimental results of the proposed approach on five microarray databases show that the robustness metric of the feature selection process is in the interval of [0.70, 0.88]. Also the classification accuracy is in the range of [91%, 96%].  相似文献   

8.

Background  

The set of extreme pathways (ExPa), {p i }, defines the convex basis vectors used for the mathematical characterization of the null space of the stoichiometric matrix for biochemical reaction networks. ExPa analysis has been used for a number of studies to determine properties of metabolic networks as well as to obtain insight into their physiological and functional states in silico. However, the number of ExPas, p = |{p i }|, grows with the size and complexity of the network being studied, and this poses a computational challenge. For this study, we investigated the relationship between the number of extreme pathways and simple network properties.  相似文献   

9.
Motivation: Finding a good network null model for protein–proteininteraction (PPI) networks is a fundamental issue. Such a modelwould provide insights into the interplay between network structureand biological function as well as into evolution. Also, network(graph) models are used to guide biological experiments anddiscover new biological features. It has been proposed thatgeometric random graphs are a good model for PPI networks. Ina geometric random graph, nodes correspond to uniformly randomlydistributed points in a metric space and edges (links) existbetween pairs of nodes for which the corresponding points inthe metric space are close enough according to some distancenorm. Computational experiments have revealed close matchesbetween key topological properties of PPI networks and geometricrandom graph models. In this work, we push the comparison furtherby exploiting the fact that the geometric property can be testedfor directly. To this end, we develop an algorithm that takesPPI interaction data and embeds proteins into a low-dimensionalEuclidean space, under the premise that connectivity informationcorresponds to Euclidean proximity, as in geometric-random graphs.We judge the sensitivity and specificity of the fit by computingthe area under the Receiver Operator Characteristic (ROC) curve.The network embedding algorithm is based on multi-dimensionalscaling, with the square root of the path length in a networkplaying the role of the Euclidean distance in the Euclideanspace. The algorithm exploits sparsity for computational efficiency,and requires only a few sparse matrix multiplications, givinga complexity of O(N2) where N is the number of proteins. Results: The algorithm has been verified in the sense that itsuccessfully rediscovers the geometric structure in artificiallyconstructed geometric networks, even when noise is added byre-wiring some links. Applying the algorithm to 19 publiclyavailable PPI networks of various organisms indicated that:(a) geometric effects are present and (b) two-dimensional Euclideanspace is generally as effective as higher dimensional Euclideanspace for explaining the connectivity. Testing on a high-confidenceyeast data set produced a very strong indication of geometricstructure (area under the ROC curve of 0.89), with this networkbeing essentially indistinguishable from a noisy geometric network.Overall, the results add support to the hypothesis that PPInetworks have a geometric structure. Availability: MATLAB code implementing the algorithm is availableupon request. Contact: natasha{at}ics.uci.edu Associate Editor: Olga Troyanskaya  相似文献   

10.
11.
12.
Abstract

We propose a concept for a homogenous computational model in carrying out cross-scale numerical experiments on liquids. The model employs the particle paradigm and comprises three types of simulation techniques: molecular dynamics (MD), dissipative particle dynamics (DPD) and smoothed particle hydrodynamics (SPH). With respect to the definition of the collision operator, this model may work in different hierarchical spatial and time scales as: MD in the atomistic scale, DPD in the mesoscale and SPH in the macroscale. The optimal computational efficiency of the three types of cross-scale experiments are estimated in dependence on: the system size N-where N is the number of particles-and the number of processors P employed for computer simulation. For the three-hierarchical-stage, as embodied in the MD-DPD-SPH model, the efficiency is proportional to N 8/7 but its dependence on P is different for each of the three types of cross-scale experiments. The problem of matching the different scales is discussed.  相似文献   

13.
Cryptic relatedness is a confounding factor in genetic diversity and genetic association studies. Development of strategies to reduce cryptic relatedness in a sample is a crucial step for downstream genetic analyses. This study uses a node selection algorithm, based on network degrees of centrality, to evaluate its applicability and impact on evaluation of genetic diversity and population stratification. 1,036 Guzerá (Bos indicus) females were genotyped using Illumina Bovine SNP50 v2 BeadChip. Four strategies were compared. The first and second strategies consist on a iterative exclusion of most related individuals based on PLINK kinship coefficient (φij) and VanRaden's φij, respectively. The third and fourth strategies were based on a node selection algorithm. The fourth strategy, Network G matrix, preserved the larger number of individuals with a better diversity and representation from the initial sample. Determining the most probable number of populations was directly affected by the kinship metric. Network G matrix was the better strategy for reducing relatedness due to producing a larger sample, with more distant individuals, a more similar distribution when compared with the full data set in the MDS plots and keeping a better representation of the population structure. Resampling strategies using VanRaden's φij as a relationship metric was better to infer the relationships among individuals. Moreover, the resampling strategies directly impact the genomic inflation values in genomewide association studies. The use of the node selection algorithm also implies better selection of the most central individuals to be removed, providing a more representative sample.  相似文献   

14.
15.

Background  

The study of biological systems demands computational support. If targeting a biological problem, the reuse of existing computational models can save time and effort. Deciding for potentially suitable models, however, becomes more challenging with the increasing number of computational models available, and even more when considering the models' growing complexity. Firstly, among a set of potential model candidates it is difficult to decide for the model that best suits ones needs. Secondly, it is hard to grasp the nature of an unknown model listed in a search result set, and to judge how well it fits for the particular problem one has in mind.  相似文献   

16.
Sequence comparison with concave weighting functions   总被引:2,自引:0,他引:2  
We consider efficient methods for computing a difference metric between two sequences of symbols, where the cost of an operation to insert or delete a block of symbols is a concave function of the block's length. Alternatively, sequences can be optimally aligned when gap penalties are a concave function of the gap length. Two algorithms based on the ‘candidate list paradigm’ first used by Waterman (1984) are presented. The first computes significantly more parsimonious candidate lists than Waterman's method. The second method refines the first to the point of guaranteeingO(N 2 lgN) worst-case time complexity, and under certain conditionsO(N 2). Experimental data show how various properties of the comparison problem affect the methods' relative performance. A number of extensions are discussed, among them a technique for constructing optimal alignments inO(N) space in expectation. This variation gives a practical method for comparing long amino sequences on a small computer. This work was supported in part by NSF Grant DCR-8511455.  相似文献   

17.

Background  

Many high-throughput genomic experiments, such as Synthetic Genetic Array and yeast two-hybrid, use colony growth on solid media as a screen metric. These experiments routinely generate over 100,000 data points, making data analysis a time consuming and painstaking process. Here we describe ScreenMill, a new software suite that automates image analysis and simplifies data review and analysis for high-throughput biological experiments.  相似文献   

18.
Krzanowski WJ 《Biometrics》2006,62(1):239-244
Assessing the sensitivity or sampling variability of multivariate ordination methods is essential if inferences are to be drawn from the analysis, but such assessment has to date been notably absent in many applications of multidimensional scaling (MDS). The only available technique seems to be the one by DeLeeuw and Meulman who proposed a special jackknife in a general MDS setting, but this method does not appear to have been widely used to date. A possible reason for this is that it is perceived to be computationally daunting. However, if attention is focused on classical metric scaling (principal coordinate analysis) then known analytical results can be used and the apparent computational complexity disappears. The purpose of this article is to set out these results, to indicate their use in more general analysis of distance, and to illustrate the methodology on some biometric examples.  相似文献   

19.

Background  

The Allen Brain Atlas (ABA) project systematically profiles three-dimensional high-resolution gene expression in postnatal mouse brains for thousands of genes. By unveiling gene behaviors at both the cellular and molecular levels, ABA is becoming a unique and comprehensive neuroscience data source for decoding enigmatic biological processes in the brain. Given the unprecedented volume and complexity of the in situ hybridization image data, data mining in this area is extremely challenging. Currently, the ABA database mainly serves as an online reference for visual inspection of individual genes; the underlying rich information of this large data set is yet to be explored by novel computational tools. In this proof-of-concept study, we studied the hypothesis that genes sharing similar three-dimensional expression profiles in the mouse brain are likely to share similar biological functions.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号