首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Predicting functionally important residues from sequence conservation   总被引:2,自引:1,他引:1  
MOTIVATION: All residues in a protein are not equally important. Some are essential for the proper structure and function of the protein, whereas others can be readily replaced. Conservation analysis is one of the most widely used methods for predicting these functionally important residues in protein sequences. RESULTS: We introduce an information-theoretic approach for estimating sequence conservation based on Jensen-Shannon divergence. We also develop a general heuristic that considers the estimated conservation of sequentially neighboring sites. In large-scale testing, we demonstrate that our combined approach outperforms previous conservation-based measures in identifying functionally important residues; in particular, it is significantly better than the commonly used Shannon entropy measure. We find that considering conservation at sequential neighbors improves the performance of all methods tested. Our analysis also reveals that many existing methods that attempt to incorporate the relationships between amino acids do not lead to better identification of functionally important sites. Finally, we find that while conservation is highly predictive in identifying catalytic sites and residues near bound ligands, it is much less effective in identifying residues in protein-protein interfaces. AVAILABILITY: Data sets and code for all conservation measures evaluated are available at http://compbio.cs.princeton.edu/conservation/  相似文献   

2.
《IRBM》2021,42(6):400-406
1) ObjectivePulmonary optical endomicroscopy (POE) is an imaging technology in real time. It allows to examine pulmonary alveoli at a microscopic level. Acquired in clinical settings, a POE image sequence can have as much as 25% of the sequence being uninformative frames (i.e. pure-noise and motion artifacts). For future data analysis, these uninformative frames must be first removed from the sequence. Therefore, the objective of our work is to develop an automatic detection method of uninformative images in endomicroscopy images.2) Material and methodsWe propose to take the detection problem as a classification one. Considering advantages of deep learning methods, a classifier based on CNN (Convolutional Neural Network) is designed with a new loss function based on Havrda-Charvat entropy which is a parametrical generalization of the Shannon entropy. We propose to use this formula to get a better hold on all sorts of data since it provides a model more stable than the Shannon entropy.3) ResultsOur method is tested on one POE dataset including 3895 distinct images and is showing better results than using Shannon entropy and behaves better with regard to the problem of overfitting. We obtain 70% of accuracy with Shannon entropy versus 77 to 79% with Havrda-Charvat.4) ConclusionWe can conclude that Havrda-Charvat entropy is better suited for restricted and or noisy datasets due to its generalized nature. It is also more suitable for classification in endomicroscopy datasets.  相似文献   

3.
Protein sequence conservation is a powerful and widely used indicator for predicting catalytic residues from enzyme sequences. In order to incorporate amino acid similarity into conservation measures, one attempt is to group amino acids into disjoint sets. In this paper, based on the overlapping amino acids classification proposed by Taylor, we define the relative entropy of Venn diagram (RVD) and RVD2. In large-scale testing, we demonstrate that RVD and RVD2 perform better than many existing conservation measures in identifying catalytic residues, especially than the commonly used relative entropy (RE) and Jensen–Shannon divergence (JSD). To further improve RVD and RVD2, two new conservation measures are obtained by combining them with the classical JSD. Experimental results suggest that these combination measures have excellent performances in identifying catalytic residues.  相似文献   

4.
5.
We have investigated the registration of mammograms based on the Tsallis entropy using mutual information measure. Tsallis entropy has one more parameter ‘q’ and the values of ‘q’ decide the quality of the registration. Existing Tsallis entropy based algorithms are not automatic as they claimed to be. In this article, an automatic affine image registration based on Tsallis entropy is proposed and its performance is analyzed for clinically acquired mammograms for globally registering them. The accuracy is compared with traditionally used mutual information and normalized mutual information based on Shannon entropy. Our algorithm shows promising results with increased accuracy with reduction in number of evaluations. Further, the need for pre-registration in mammogram is discussed in detail. Through this experiment, it is found that the proposed algorithm is effective enough to replace Shannon and existing Tsallis entropy based affine registration schemes.  相似文献   

6.
The Shannon information entropy of protein sequences.   总被引:6,自引:1,他引:5       下载免费PDF全文
A comprehensive data base is analyzed to determine the Shannon information content of a protein sequence. This information entropy is estimated by three methods: a k-tuplet analysis, a generalized Zipf analysis, and a "Chou-Fasman gambler." The k-tuplet analysis is a "letter" analysis, based on conditional sequence probabilities. The generalized Zipf analysis demonstrates the statistical linguistic qualities of protein sequences and uses the "word" frequency to determine the Shannon entropy. The Zipf analysis and k-tuplet analysis give Shannon entropies of approximately 2.5 bits/amino acid. This entropy is much smaller than the value of 4.18 bits/amino acid obtained from the nonuniform composition of amino acids in proteins. The "Chou-Fasman" gambler is an algorithm based on the Chou-Fasman rules for protein structure. It uses both sequence and secondary structure information to guess at the number of possible amino acids that could appropriately substitute into a sequence. As in the case for the English language, the gambler algorithm gives significantly lower entropies than the k-tuplet analysis. Using these entropies, the number of most probable protein sequences can be calculated. The number of most probable protein sequences is much less than the number of possible sequences but is still much larger than the number of sequences thought to have existed throughout evolution. Implications of these results for mutagenesis experiments are discussed.  相似文献   

7.

Background  

In phylogenetic inference one is interested in obtaining samples from the posterior distribution over the tree space on the basis of some observed DNA sequence data. One of the simplest sampling methods is the rejection sampler due to von Neumann. Here we introduce an auto-validating version of the rejection sampler, via interval analysis, to rigorously draw samples from posterior distributions over small phylogenetic tree spaces.  相似文献   

8.
Detection of homologous proteins with low-sequence identity to a given target (remote homologues) is routinely performed with alignment algorithms that take advantage of sequence profile. In this article, we investigate the efficacy of different alignment procedures for the task at hand on a set of 185 protein pairs with similar structures but low-sequence similarity. Criteria based on the SCOP label detection and MaxSub scores are adopted to score the results. We investigate the efficacy of alignments based on sequence-sequence, sequence-profile, and profile-profile information. We confirm that with profile-profile alignments the results are better than with other procedures. In addition, we report, and this is novel, that the selection of the results of the profile-profile alignments can be improved by using Shannon entropy, indicating that this parameter is important to recognize good profile-profile alignments among a plethora of meaningless pairs. By this, we enhance the global search accuracy without losing sensitivity and filter out most of the erroneous alignments. We also show that when the entropy filtering is adopted, the quality of the resulting alignments is comparable to that computed for the target and template structures with CE, a structural alignment program.  相似文献   

9.
Imai T  Fujita N 《Proteins》2004,56(4):650-660
G-protein-coupled receptors (GPCRs) play a crucial role in signal transduction and receive a wide variety of ligands. GPCRs are a major target in drug design, as nearly 50% of all contemporary medicines act on GPCRs. GPCRs are membrane proteins possessing a common structural feature, seven transmembrane helices. In order to design an effective drug to act on a GPCR, knowledge of the three-dimensional (3D) structure of the target GPCR is indispensable. However, as GPCRs are membrane bound, their 3D structures are difficult to obtain. Thus we conducted statistical sequence analyses to find information about 3D structure and ligand binding using the receptors' primary sequences. We present statistical sequence analyses of 270 human GPCRs with regard to entropy (Shannon entropy in sequence alignment), hydrophobicity and volume, which are associated with the alpha-helical periodicity of the accessibility to the surrounding lipid. We found periodicity such that the phase changes once in the middle of each transmembrane region, both in the entropy plot and in the hydrophobicity plot. The phase shift in the entropy plot reflects the variety of ligands and the generality of the mechanism of signal transduction. The two periodic regions in the hydrophobicity plot indicate the regions facing the hydrophobic lipid chain and the polar phospholipid headgroup. We also found a simple periodicity in the plot of volume deviation, which suggests conservation of the stable structural packing among the transmembrane helices.  相似文献   

10.
In biology, the theory of information has been used to study the degree of order of many living systems. Different concepts of entropy have been applied to the analysis of phyllotaxis. In the present paper we will determine the degree of order of disorganized patterns by using informational entropy concepts deduced from the work of Brillouin, Shannon, and Yagil. As case studies, we will apply these concepts of entropy to the disorganized patterns found in mutants of Arabidopsis. The calculation of entropy gives a precise idea of the degree of order of a phyllotactic system.  相似文献   

11.
Information has an entropic character which can be analyzed within the framework of the Statistical Theory in molecular systems. R. Landauer and C.H. Bennett showed that a logical copy can be carried out in the limit of no dissipation if the computation is performed sufficiently slowly. Structural and recent single-molecule assays have provided dynamic details of polymerase machinery with insight into information processing. Here, we introduce a rigorous characterization of Shannon Information in biomolecular systems and apply it to DNA replication in the limit of no dissipation. Specifically, we devise an equilibrium pathway in DNA replication to determine the entropy generated in copying the information from a DNA template in the absence of friction. Both the initial state, the free nucleotides randomly distributed in certain concentrations, and the final state, a polymerized strand, are mesoscopic equilibrium states for the nucleotide distribution. We use empirical stacking free energies to calculate the probabilities of incorporation of the nucleotides. The copied strand is, to first order of approximation, a state of independent and non-indentically distributed random variables for which the nucleotide that is incorporated by the polymerase at each step is dictated by the template strand, and to second order of approximation, a state of non-uniformly distributed random variables with nearest-neighbor interactions for which the recognition of secondary structure by the polymerase in the resultant double-stranded polymer determines the entropy of the replicated strand. Two incorporation mechanisms arise naturally and their biological meanings are explained. It is known that replication occurs far from equilibrium and therefore the Shannon entropy here derived represents an upper bound for replication to take place. Likewise, this entropy sets a universal lower bound for the copying fidelity in replication.  相似文献   

12.
Beta diversity is among the most employed theoretical concepts in ecology and biodiversity conservation. Up to date, a self‐contained definition of it, with no reference to alpha and gamma diversity, has never been proposed. Using Kullback‐Leibler divergence, we present the explicit formula of Shannon's β entropy, a bias correction for its estimator and a confidence interval. We also provide the mathematical framework to decompose Shannon diversity into several hierarchical nested levels. From botanical inventories of tropical forest plots in French Guiana, we estimate Shannon diversity at the plot, forest and regional level. We believe this is a complete and usefulness toolbox for ecologists interested in partitioning biodiversity.  相似文献   

13.
Rosen (Bull. Math Biophysics. 1959) has argued that a self-reproducing automaton of the type originally described by von Neumann is impossible because of a logical paradox inherent in its definition. The paradox is resolved by explicitly allowing errors (mutations) in the system and thus introducing evolution. There is no paradox in an automaton, originating from a slightly different ancestor through mutation. The von Neumann model thus becomes realistic and useful for a discussion of biological phenomena.  相似文献   

14.
The diversity of a species assemblage has been studied extensively for many decades in relation to its possible connection with ecosystem functioning and organization. In this view most diversity measures, such as Shannon's entropy, rely upon information theory as a basis for the quantification of diversity. Also, traditional diversity measures are computed using species relative abundances and cannot account for the ecological differences between species. Rao first proposed a diversity index, termed quadratic diversity (Q) that incorporates both species relative abundances and pairwise distances between species. Quadratic diversity is traditionally defined as the expected distance between two randomly selected individuals. In this paper, we show that quadratic diversity can be interpreted as the expected conflict among the species of a given assemblage. From this unusual interpretation, it naturally follows that Rao's Q can be related to the Shannon entropy through a generalized version of the Tsallis parametric entropy.  相似文献   

15.
Spiking and bursting patterns of neurons are characterized by a high degree of variability. A single neuron can demonstrate endogenously various bursting patterns, changing in response to external disturbances due to synapses, or to intrinsic factors such as channel noise. We argue that in a model of the leech heart interneuron existing variations of bursting patterns are significantly enhanced by a small noise. In the absence of noise this model shows periodic bursting with fixed numbers of interspikes for most parameter values. As the parameter of activation kinetics of a slow potassium current is shifted to more hyperpolarized values of the membrane potential, the model undergoes a sequence of incremental spike adding transitions accumulating towards a periodic tonic spiking activity. Within a narrow parameter window around every spike adding transition, spike alteration of bursting is deterministically chaotic due to homoclinic bifurcations of a saddle periodic orbit. We have found that near these transitions the interneuron model becomes extremely sensitive to small random perturbations that cause a wide expansion and overlapping of the chaotic windows. The chaotic behavior is characterized by positive values of the largest Lyapunov exponent, and of the Shannon entropy of probability distribution of spike numbers per burst. The windows of chaotic dynamics resemble the Arnold tongues being plotted in the parameter plane, where the noise intensity serves as a second control parameter. We determine the critical noise intensities above which the interneuron model generates only irregular bursting within the overlapped windows.  相似文献   

16.
Prabhu NV  Lee AL  Wand AJ  Sharp KA 《Biochemistry》2003,42(2):562-570
All-atom, explicit water molecular dynamics simulations of calcium-loaded calmodulin complexed with a peptide corresponding to the smooth muscle myosin light chain kinase target were carried out at 295 and 346 K. Amide and side chain methyl angular generalized order parameters were calculated and analyzed in the context of the protein's structure and dynamics. The agreement between amide order parameters measured by NMR and those from the simulations was found to be good, especially at the higher temperature, indicating both better convergence for the latter and excellent transferrability of the CHARMM parameters to the higher temperature. Subtle dynamical features such as helix fraying were reproduced. A large range of order parameters for the nine calmodulin methionines was observed in the NMR, and reproduced quite well in the simulations. The major determinant of the methionine order parameter was found to be the proximity to side chains of aromatic residues. An upper bound estimate of the difference in backbone entropy between loop and helical regions was extracted from the order parameters using a model of motion in an effective potential. Although loop regions are more flexible than helical regions, it was found that the entropy loss per residue upon folding was only approximately 20% less for loops than for helices. Pairwise correlated motions, which could significantly lower entropy estimates obtained from order parameter analysis alone, were found to be largely absent.  相似文献   

17.
Traditional diversity measures such as the Shannon entropy are generally computed from the species' relative abundance vector of a given community to the exclusion of species' absolute abundances. In this paper, I first mention some examples where the total information content associated with a given community may be more adequate than Shannon's average information content for a better understanding of ecosystem functioning. Next, I propose a parametric measure of statistical information that contains both Shannon's entropy and total information content as special cases of this more general function.  相似文献   

18.
非平衡群体基因变异测量的Shannon信息量方法   总被引:14,自引:2,他引:14  
在Shannon信息量的基础上,对非平衡群体建立了群体基因型相对信息量S′(G),纯合体相对信息量S′J(G)、杂合体相对信息量S′H(G)的概念,并赋予它们以遗传学意义,与基因一致度J和基因多样度D进行了理论比较,结果表明,二者在数量规律上有很好的一致性,但又是相对独立的指标体系,且各相对信息量还有新的内涵。S′(G)既能表征基因变异,又能反映基因型水平上的遗传变异,S′J(G)主要反映纯合体的遗传变异,S′H(G)主要反映杂合体的遗传变异,各相对信息量既可反映群体的遗传变异程度,又能比较不同位点间的遗传变异程度。  相似文献   

19.
The increasing number and diversity of protein sequence families requires new methods to define and predict details regarding function. Here, we present a method for analysis and prediction of functional sub-types from multiple protein sequence alignments. Given an alignment and set of proteins grouped into sub-types according to some definition of function, such as enzymatic specificity, the method identifies positions that are indicative of functional differences by comparison of sub-type specific sequence profiles, and analysis of positional entropy in the alignment. Alignment positions with significantly high positional relative entropy correlate with those known to be involved in defining sub-types for nucleotidyl cyclases, protein kinases, lactate/malate dehydrogenases and trypsin-like serine proteases. We highlight new positions for these proteins that suggest additional experiments to elucidate the basis of specificity. The method is also able to predict sub-type for unclassified sequences. We assess several variations on a prediction method, and compare them to simple sequence comparisons. For assessment, we remove close homologues to the sequence for which a prediction is to be made (by a sequence identity above a threshold). This simulates situations where a protein is known to belong to a protein family, but is not a close relative of another protein of known sub-type. Considering the four families above, and a sequence identity threshold of 30 %, our best method gives an accuracy of 96 % compared to 80 % obtained for sequence similarity and 74 % for BLAST. We describe the derivation of a set of sub-type groupings derived from an automated parsing of alignments from PFAM and the SWISSPROT database, and use this to perform a large-scale assessment. The best method gives an average accuracy of 94 % compared to 68 % for sequence similarity and 79 % for BLAST. We discuss implications for experimental design, genome annotation and the prediction of protein function and protein intra-residue distances.  相似文献   

20.

Background

The quantification of species-richness and species-turnover is essential to effective monitoring of ecosystems. Wetland ecosystems are particularly in need of such monitoring due to their sensitivity to rainfall, water management and other external factors that affect hydrology, soil, and species patterns. A key challenge for environmental scientists is determining the linkage between natural and human stressors, and the effect of that linkage at the species level in space and time. We propose pixel intensity based Shannon entropy for estimating species-richness, and introduce a method based on statistical wavelet multiresolution texture analysis to quantitatively assess interseasonal and interannual species turnover.

Methodology/Principal Findings

We model satellite images of regions of interest as textures. We define a texture in an image as a spatial domain where the variations in pixel intensity across the image are both stochastic and multiscale. To compare two textures quantitatively, we first obtain a multiresolution wavelet decomposition of each. Either an appropriate probability density function (pdf) model for the coefficients at each subband is selected, and its parameters estimated, or, a non-parametric approach using histograms is adopted. We choose the former, where the wavelet coefficients of the multiresolution decomposition at each subband are modeled as samples from the generalized Gaussian pdf. We then obtain the joint pdf for the coefficients for all subbands, assuming independence across subbands; an approximation that simplifies the computational burden significantly without sacrificing the ability to statistically distinguish textures. We measure the difference between two textures'' representative pdf''s via the Kullback-Leibler divergence (KL). Species turnover, or diversity, is estimated using both this KL divergence and the difference in Shannon entropy. Additionally, we predict species richness, or diversity, based on the Shannon entropy of pixel intensity.To test our approach, we specifically use the green band of Landsat images for a water conservation area in the Florida Everglades. We validate our predictions against data of species occurrences for a twenty-eight years long period for both wet and dry seasons. Our method correctly predicts 73% of species richness. For species turnover, the newly proposed KL divergence prediction performance is near 100% accurate. This represents a significant improvement over the more conventional Shannon entropy difference, which provides 85% accuracy. Furthermore, we find that changes in soil and water patterns, as measured by fluctuations of the Shannon entropy for the red and blue bands respectively, are positively correlated with changes in vegetation. The fluctuations are smaller in the wet season when compared to the dry season.

Conclusions/Significance

Texture-based statistical multiresolution image analysis is a promising method for quantifying interseasonal differences and, consequently, the degree to which vegetation, soil, and water patterns vary. The proposed automated method for quantifying species richness and turnover can also provide analysis at higher spatial and temporal resolution than is currently obtainable from expensive monitoring campaigns, thus enabling more prompt, more cost effective inference and decision making support regarding anomalous variations in biodiversity. Additionally, a matrix-based visualization of the statistical multiresolution analysis is presented to facilitate both insight and quick recognition of anomalous data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号