首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We consider the problem of identifying common three-dimensional substructures between proteins. Our method is based on comparing the shape of the alpha-carbon backbone structures of the proteins in order to find three-dimensional (3D) rigid motions that bring portions of the geometric structures into correspondence. We propose a geometric representation of protein backbone chains that is compact yet allows for similarity measures that are robust against noise and outliers. This representation encodes the structure of the backbone as a sequence of unit vectors, defined by each adjacent pair of alpha-carbons. We then define a measure of the similarity of two protein structures based on the root mean squared (RMS) distance between corresponding orientation vectors of the two proteins. Our measure has several advantages over measures that are commonly used for comparing protein shapes, such as the minimum RMS distance between the 3D positions of corresponding atoms in two proteins. A key advantage is that this new measure behaves well for identifying common substructures, in contrast with position-based measures where the nonmatching portions of the structure dominate the measure. At the same time, it avoids the quadratic space and computational difficulties associated with methods based on distance matrices and contact maps. We show applications of our approach to detecting common contiguous substructures in pairs of proteins, as well as the more difficult problem of identifying common protein domains (i.e., larger substructures that are not necessarily contiguous along the protein chain).  相似文献   

2.
The article suggests a measure to evaluate the thermodynamic maturity of industrial systems at the level of single process units. The measure can be quantified with reasonable confidence on the basis of entropy production as defined by irreversible thermodynamics theory. It quantifies, for one process unit, the distance between its actual state of operation and its state with minimum entropy production or optimum exergy efficiency, when the two states are constrained with a fixed production capacity of the process unit. We suggest that the minimum entropy production state is a mature state, or that processes that operate at this state are mature. We propose to call the measure "the thermodynamic maturity indicator" (π), and we define it as the ratio between the minimum entropy production and the actual entropy production. We calculated π on the basis of literature data for some examples of industrial process units in the chemical and process industry (i.e., heat exchanger, chemical reactor, distillation column, and paper drying machine). The proposed thermodynamic measure should be of interest for industrial ecology because it emerges from the entropy production rate, a dynamic function that can be optimized and used to understand the thermodynamic limit to improving the exergy efficiency of industrial processes. Although not a tool for replacing one process with another or comparing one technology to another, π may be used to assess actual operation states of single process units in industrial ecology.  相似文献   

3.
4.
近来,一个基于熵的指数被提出用来对人类复杂性状位点进行连锁不平衡定位.这个熵指数比较了患病个体与正常个体或极端样本之间标记基因频率的熵和条件熵.本文基于熵理论,提出了另一个备选指数.这个新的指数比较患病个体与正常个体之间标记基因型频率的熵和条件熵.计算机模拟结果表明本文提出的新指数平行于之前的熵指数.而基于遗传性血色病(hereditary haemochromatosis,HH)数据的分析表明了这个新指数能有效对人类复杂性状位点进行精细定位.  相似文献   

5.
In this paper, we introduce a probabilistic measure for computing the similarity between two biological sequences without alignment. The computation of the similarity measure is based on the Kullback-Leibler divergence of two constructed Markov models. We firstly validate the method on clustering nine chromosomes from three species. Secondly, we give the result of similarity search based on our new method. We lastly apply the measure to the construction of phylogenetic tree of 48 HEV genome sequences. Our results indicate that the weighted relative entropy is an efficient and powerful alignment-free measure for the analysis of sequences in the genomic scale.  相似文献   

6.
We propose a parametric class of phylogenetic diversity (PD) measures that are sensitive to both species abundance and species taxonomic or phylogenetic distances. This work extends the conventional parametric species-neutral approach (based on 'effective number of species' or Hill numbers) to take into account species relatedness, and also generalizes the traditional phylogenetic approach (based on 'total phylogenetic length') to incorporate species abundances. The proposed measure quantifies 'the mean effective number of species' over any time interval of interest, or the 'effective number of maximally distinct lineages' over that time interval. The product of the measure and the interval length quantifies the 'branch diversity' of the phylogenetic tree during that interval. The new measures generalize and unify many existing measures and lead to a natural definition of taxonomic diversity as a special case. The replication principle (or doubling property), an important requirement for species-neutral diversity, is generalized to PD. The widely used Rao's quadratic entropy and the phylogenetic entropy do not satisfy this essential property, but a simple transformation converts each to our measures, which do satisfy the property. The proposed approach is applied to forest data for interpreting the effects of thinning.  相似文献   

7.
Phylogenetic networks are a generalization of phylogenetic trees that allow for the representation of nontreelike evolutionary events, like recombination, hybridization, or lateral gene transfer. While much progress has been made to find practical algorithms for reconstructing a phylogenetic network from a set of sequences, all attempts to endorse a class of phylogenetic networks (strictly extending the class of phylogenetic trees) with a well-founded distance measure have, to the best of our knowledge and with the only exception of the bipartition distance on regular networks, failed so far. In this paper, we present and study a new meaningful class of phylogenetic networks, called tree-child phylogenetic networks, and we provide an injective representation of these networks as multisets of vectors of natural numbers, their path multiplicity vectors. We then use this representation to define a distance on this class that extends the well-known Robinson-Foulds distance for phylogenetic trees and to give an alignment method for pairs of networks in this class. Simple polynomial algorithms for reconstructing a tree-child phylogenetic network from its path multiplicity vectors, for computing the distance between two tree-child phylogenetic networks and for aligning a pair of tree-child phylogenetic networks, are provided. They have been implemented as a Perl package and a Java applet, which can be found at http://bioinfo.uib.es/~recerca/phylonetworks/mudistance/.  相似文献   

8.
OBJECTIVE: To identify extracellular matrix deposition on combined Masson elastin stains from cross-sectional, fixed vein grafts. STUDY DESIGN: Source vectors from RGB components of color images are transformed into new vectors with most of the energy concentrated in fewer coefficients based on the eigenvalues and eigenvectors of their co-variance matrix so their dimension can be reduced for efficient computation and analysis. The vectors are distributed in a triangular shape in which most vectors are located in a long, narrow strip that can be approximated by a straight line while a separate group of vectors from collagen areas form a loose cluster away from the line. An iterative procedure has been developed for the representative vectors in the 2 centroids for linear and circular clusters. The linear centroid consists of all vectors in a straight line, and the centroid of the circular cluster is a single vector. Vector classification is based on the measure of its distance to each of the 2 centroids. RESULTS: The automatic segmentation of the collagen content pixels in green-blue matches the image background color. CONCLUSION: The procedure automatically quantifies and characterizes the neointimal deposition after surgical vein grafting in mice.  相似文献   

9.
Sudden cardiac death (SCD) is a leading cause of mortality with an incidence of 3 million cases per year worldwide. Therapies for patients who have survived an SCD episode or have a high risk of developing lethal ventricular arrhythmia are well established and depend mainly on risk stratification. In this study we investigated the suitability of the non-linear measure compression entropy (Hc) for improved risk prediction in cardiac patients. We recorded 24-h Holter ECG for 300 patients with congestive heart failure (CHF). During a mean follow-up period of 12 months, 32 patients died due to a cardiac event. Hc depends on the compression parameters window length w and buffer length b, which were optimised by analysing a subgroup of patients. Compression entropies based on the beat-to-beat interval (BBI) were subsequently calculated and compared with standard heartrate variability parameters. Statistical analysis revealed significant differences between high- and low-risk CHF patients in standard HRV measures, as well as compression entropy based on the BBI (cardiac death, p = 0.005; SCD, p = 0.02). In conclusion, the implementation of non-linear compression entropy analysis in multivariate analysis seems to be useful for enhanced risk stratification of cardiac death, especially SCD, in ischaemic cardiomyopathy patients.  相似文献   

10.
We introduce a new variant of the root mean square distance (RMSD) for comparing protein structures whose range of values is independent of protein size. This new dimensionless measure (relative RMSD, or RRMSD) is zero between identical structures and one between structures that are as globally dissimilar as an average pair of random polypeptides of respective sizes. The RRMSD probability distribution between random polypeptides converges to a universal curve as the chain length increases. The correlation coefficients between aligned random structures are computed as a function of polypeptide size showing two characteristic lengths of 4.7 and 37 residues. These lengths mark the separation between phases of different structural order between native protein fragments. The implications for threading are discussed.  相似文献   

11.
Allometric relationships among morphological traits underlie important patterns in ecology. These relationships are often phylogenetically shared; thus quantifying allometric relationships may allow for estimating difficult-to-measure traits across species. One such trait, proboscis length in bees, is assumed to be important in structuring bee communities and plant-pollinator networks. However, it is difficult to measure and thus rarely included in ecological analyses. We measured intertegular distance (as a measure of body size) and proboscis length (glossa and prementum, both individually and combined) of 786 individual bees of 100 species across 5 of the 7 extant bee families (Hymenoptera: Apoidea: Anthophila). Using linear models and model selection, we determined which parameters provided the best estimate of proboscis length. We then used coefficients to estimate the relationship between intertegular distance and proboscis length, while also considering family. Using allometric equations with an estimation for a scaling coefficient between intertegular distance and proboscis length and coefficients for each family, we explain 91% of the variance in species-level means for bee proboscis length among bee species. However, within species, individual-level intertegular distance was a poor predictor of individual proboscis length. To make our findings easy to use, we created an R package that allows estimation of proboscis length for individual bee species by inputting only family and intertegular distance. The R package also calculates foraging distance and body mass based on previously published equations. Thus by considering both taxonomy and intertegular distance we enable accurate estimation of an ecologically and evolutionarily important trait.  相似文献   

12.
We develop a metric for probability distributions with applications to biological sequence analysis. Our distance metric is obtained by minimizing a functional defined on the class of paths over probability measures on N categories. The underlying mathematical theory is connected to a constrained problem in the calculus of variations. The solution presented is a numerical solution, which approximates the true solution in a set of cases called rich paths where none of the components of the path is zero. The functional to be minimized is motivated by entropy considerations, reflecting the idea that nature might efficiently carry out mutations of genome sequences in such a way that the increase in entropy involved in transformation is as small as possible. We characterize sequences by frequency profiles or probability vectors, in the case of DNA where N is 4 and the components of the probability vector are the frequency of occurrence of each of the bases A, C, G and T. Given two probability vectors a and b, we define a distance function based as the infimum of path integrals of the entropy function H(p) over all admissible paths p(t), 0 t1, with p(t) a probability vector such that p(0)=a and p(1)=b. If the probability paths p(t) are parameterized as y(s) in terms of arc length s and the optimal path is smooth with arc length L, then smooth and rich optimal probability paths may be numerically estimated by a hybrid method of iterating Newtons method on solutions of a two point boundary value problem, with unknown distance L between the abscissas, for the Euler–Lagrange equations resulting from a multiplier rule for the constrained optimization problem together with linear regression to improve the arc length estimate L. Matlab code for these numerical methods is provided which works only for rich optimal probability vectors. These methods motivate a definition of an elementary distance function which is easier and faster to calculate, works on non–rich vectors, does not involve variational theory and does not involve differential equations, but is a better approximation of the minimal entropy path distance than the distance ||ba||2. We compute minimal entropy distance matrices for examples of DNA myostatin genes and amino-acid sequences across several species. Output tree dendograms for our minimal entropy metric are compared with dendograms based on BLAST and BLAST identity scores.Mathematics Subject Classification (2000): 92B05, 92D20  相似文献   

13.
Dispersal is a central life‐history trait for most animals and plants: it allows to colonize new habitats, escape from competition or avoid inbreeding. Yet, not all species are mobile enough to perform sufficient dispersal. Such passive dispersers may use more mobile animals as dispersal vectors. If multiple potential vectors are available, an active choice can allow to optimize the dispersal process and to determine the distribution of dispersal distances, i.e. an optimal dispersal kernel. We explore dispersal and vector choice in the neotropical flower mite Spadiseius calyptrogynae using a dual approach which combines experiments with an individual‐based simulation model. Spadiseius calyptrogynae is found in lowland rainforests in Costa Rica. It inhabits inflorescences of the understorey palm Calyptrogyne ghiesbreghtiana and is phoretic on a number of flower visitors including bats, beetles and stingless bees. We hypothesised that the mites should optimise their dispersal kernel by actively choosing a specific mix of potential phoretic vectors. In a simple olfactometer setup we showed that the flower mites do indeed discriminate between potential vectors. Subsequently we used an individual‐based model to analyse the evolutionary forces responsible for the observed patterns of vector choice. The mites combine vectors exhibiting long‐distance dispersal with those allowing for more localized dispersal. This results in a fat‐tailed dispersal kernel that guarantees the occasional colonization of new host plant patches (long distance) while optimizing the exploitation of clumped resources (local dispersal). Additionally, kin competition results in a preference for small vectors that transport only few individuals at a time. At the same time, these vectors lead to directed dispersal towards suitable habitat, which increases the stability of this very specialized interaction. Our findings can be applied to other phoretic systems but also to vector‐based seed dispersal, for example.  相似文献   

14.
M. Kimmel  R. Chakraborty  D. N. Stivers    R. Deka 《Genetics》1996,143(1):549-555
Suggested molecular mechanisms for the generation of new tandem repeats of simple sequences indicate that the microsatellite loci evolve via some form of forward-backward mutation. We provide a mathematical basis for suggesting a measure of genetic distance between populations based on microsatellite variation. Our results indicate that such a genetic distance measure can remain proportional to the divergence time of populations even when the forward-backward mutations produce variable and/or directionally biased alleles size changes. If the population size and the rate of mutation remain constant, then the measure will be proportional to the time of divergence of populations. This genetic distance is expressed in terms of a ratio of components of variance of allele sizes, based on expressions developed for studying population dynamics of quantitative traits. Application of this measure to data on 18 microsatellite loci in nine human populations leads to evolutionary trees consistent with the known ethnohistory of the populations.  相似文献   

15.
The flexibility of surface loops plays an important role in protein–protein and protein–peptide recognition; it is commonly studied by Molecular Dynamics or Monte Carlo simulations. We propose to measure the relative backbone flexibility of loops by the difference in their backbone conformational entropies, which are calculated here with the local states (LS) method of Meirovitch. Thus, one can compare the entropies of loops of the same protein or, under certain simulation conditions, of different proteins. These loops should be equal in size but can differ in their sequence of amino acids residues. This methodology is applied successfully to three segments of 10 residues of a Ras protein simulated by the stochastic boundary molecular dynamics procedure. For the first time estimates of backbone entropy differences are obtained, and their correlation with B factors is pointed out; for example, the segments which consist of residues 60–65 and 112–117 have average B factors of 67 and 18 Å2, respectively, and entropy difference T ΔS = 5.4 ± 0.1 kcal/mol at T = 300 K. In a large number of recent publications the entropy due to the fast motions (on the ps-ns time scale) of N–H and C–H vectors has been obtained from their order parameter, measured in nuclear magnetic resonance spin relaxation experiments. This enables one to estimate differences in the entropy of protein segments due to folding–unfolding transitions, for example. However, the vectors are assumed to be independent, and the effect of the neglected correlations is unknown; our method is expected to become an important tool for assessing this approximation. The present calculations, obtained with the LS method, suggest that the errors involved in experimental entropy differences might not be large; however, this should be verified in each case. Potential applications of entropy calculations to rational drug design are discussed. Proteins 29:127–140, 1997. © 1997 Wiley-Liss, Inc.  相似文献   

16.
Kostal L  Lánský P 《Bio Systems》2007,89(1-3):44-49
The patterns of neuronal activity can be different even if the mean firing rate is fixed. Investigating the variability of the firing may not be sufficient and we suggest to take into account the notion of randomness. The randomness is related to the entropy of the firing, which is bounded from above by the entropy of the Poisson process (given the mean interspike interval). Thus, we propose the Kullback-Leibler distance with respect to the Poisson process as a measure of randomness in a stationary neuronal activity. Under the condition of equal mean values the KL distance does not depend on the time scale and therefore can be compared to the coefficient of variation employed to measure the variability. Furthermore, this measure can be extended to account for correlated neuronal firing. Finally, we analyze the variability and randomness for three common ISI distributions in detail: gamma, lognormal and inverse Gaussian.  相似文献   

17.
We are developing a program to calculate optimal RNA secondary structures. The model uses di-nucleotide pairing energies as with most traditional approaches. However, for long-range entropy interactions, the approach uses an entropy-loss model based on the accumulated sum of the entropy of bonding between each base-pair weighted inversely by the correlation of the RNA sequence (the Kuhn length). Stiff RNA forms very different structures from flexible RNA. The results demonstrate that the long-range folding is largely governed by this entropy and the Kuhn length.  相似文献   

18.
Recent large scale studies of senescence in animals and humans have revealed mortality rates that levelled off at advanced ages. These empirical findings are now known to be inconsistent with evolutionary theories of senescence based on the Malthusian parameter as a measure of fitness. This article analyses the incidence of mortality plateaus in terms of directionality theory, a new class of models based on evolutionary entropy as a measure of fitness. We show that the intensity of selection, in the context of directionality theory, is a convex function of age, and we invoke this property to predict that in populations evolving under bounded growth constraints, evolutionarily stable mortality patterns will be described by rates which abate with age at extreme ages. The explanatory power of directionality theory, in contrast with the limitations of the Malthusian model, accords with the claim that evolutionary entropy, rather than the Malthusian parameter, constitutes the operationally valid measure of Darwinian fitness.  相似文献   

19.
Information theory is a branch of mathematics that overlaps with communications, biology, and medical engineering. Entropy is a measure of uncertainty in the set of information. In this study, for each gene and its exons sets, the entropy was calculated in orders one to four. Based on the relative entropy of genes and exons, Kullback-Leibler divergence was calculated. After obtaining the Kullback-Leibler distance for genes and exons sets, the results were entered as input into 7 clustering algorithms: single, complete, average, weighted, centroid, median, and K-means. To aggregate the results of clustering, the AdaBoost algorithm was used. Finally, the results of the AdaBoost algorithm were investigated by GeneMANIA prediction server to explore the results from gene annotation point of view. All calculations were performed using the MATLAB Engineering Software (2015). Following our findings on investigating the results of genes metabolic pathways based on the gene annotations, it was revealed that our proposed clustering method yielded correct, logical, and fast results. This method at the same that had not had the disadvantages of aligning allowed the genes with actual length and content to be considered and also did not require high memory for large-length sequences. We believe that the performance of the proposed method could be used with other competitive gene clustering methods to group biologically relevant set of genes. Also, the proposed method can be seen as a predictive method for those genes bearing up weak genomic annotations.  相似文献   

20.

Background

The etiology of complex diseases is due to the combination of genetic and environmental factors, usually many of them, and each with a small effect. The identification of these small-effect contributing factors is still a demanding task. Clearly, there is a need for more powerful tests of genetic association, and especially for the identification of rare effects

Results

We introduce a new genetic association test based on symbolic dynamics and symbolic entropy. Using a freely available software, we have applied this entropy test, and a conventional test, to simulated and real datasets, to illustrate the method and estimate type I error and power. We have also compared this new entropy test to the Fisher exact test for assessment of association with low-frequency SNPs. The entropy test is generally more powerful than the conventional test, and can be significantly more powerful when the genotypic test is applied to low allele-frequency markers. We have also shown that both the Fisher and Entropy methods are optimal to test for association with low-frequency SNPs (MAF around 1-5%), and both are conservative for very rare SNPs (MAF<1%)

Conclusions

We have developed a new, simple, consistent and powerful test to detect genetic association of biallelic/SNP markers in case-control data, by using symbolic dynamics and symbolic entropy as a measure of gene dependence. We also provide a standard asymptotic distribution of this test statistic. Given that the test is based on entropy measures, it avoids smoothed nonparametric estimation. The entropy test is generally as good or even more powerful than the conventional and Fisher tests. Furthermore, the entropy test is more computationally efficient than the Fisher's Exact test, especially for large number of markers. Therefore, this entropy-based test has the advantage of being optimal for most SNPs, regardless of their allele frequency (Minor Allele Frequency (MAF) between 1-50%). This property is quite beneficial, since many researchers tend to discard low allele-frequency SNPs from their analysis. Now they can apply the same statistical test of association to all SNPs in a single analysis., which can be especially helpful to detect rare effects.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号