首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 501 毫秒
1.

Background

Abel and Trevors have delineated three aspects of sequence complexity, Random Sequence Complexity (RSC), Ordered Sequence Complexity (OSC) and Functional Sequence Complexity (FSC) observed in biosequences such as proteins. In this paper, we provide a method to measure functional sequence complexity.

Methods and Results

We have extended Shannon uncertainty by incorporating the data variable with a functionality variable. The resulting measured unit, which we call Functional bit (Fit), is calculated from the sequence data jointly with the defined functionality variable. To demonstrate the relevance to functional bioinformatics, a method to measure functional sequence complexity was developed and applied to 35 protein families. Considerations were made in determining how the measure can be used to correlate functionality when relating to the whole molecule and sub-molecule. In the experiment, we show that when the proposed measure is applied to the aligned protein sequences of ubiquitin, 6 of the 7 highest value sites correlate with the binding domain.

Conclusion

For future extensions, measures of functional bioinformatics may provide a means to evaluate potential evolving pathways from effects such as mutations, as well as analyzing the internal structural and functional relationships within the 3-D structure of proteins.  相似文献   

2.
Information content of protein sequences   总被引:1,自引:0,他引:1  
The complexity of large sets of non-redundant protein sequences is measured. This is done by estimating the Shannon entropy as well as applying compression algorithms to estimate the algorithmic complexity. The estimators are also applied to randomly generated surrogates of the protein data. Our results show that proteins are fairly close to random sequences. The entropy reduction due to correlations is only about 1%. However, precise estimations of the entropy of the source are not possible due to finite sample effects. Compression algorithms also indicate that the redundancy is in the order of 1%. These results confirm the idea that protein sequences can be regarded as slightly edited random strings. We discuss secondary structure and low-complexity regions as causes of the redundancy observed. The findings are related to numerical and biochemical experiments with random polypeptides.  相似文献   

3.
基于动态规划的快速序列比对算法   总被引:3,自引:0,他引:3  
序列比对算法是生物信息学中重要的研究方向之一,而动态规划法是序列比对算法中最有效最基本的方法.由于原有的基本动态规划方法时间和空间复杂度大,不适合实际的生物序列比对,因此本文在分析介绍几种相关动态规划算法的基础上,提出了一种基于动态规划的快速序列比对算法UKK_FA.实验结果表明,该算法有效地降低了时间复杂度,具有一定的实用性。  相似文献   

4.
In this paper we propose a theoretical model of protein folding and protein evolution in which a polypeptide (sequence/structure) is assumed to behave as a Maxwell Demon or Information Gathering and Using System (IGUS) that performs measurements aiming at the construction of the native structure. Our model proposes that a physical meaning to Shannon information (H) and Chaitin's algorithmic information (K) parameters can be both defined and referred from the IGUS standpoint. Our hypothesis accounts for the interdependence of protein folding and protein evolution through mutual influencing relationships mediated by the IGUS. In brief, IGUS activity in protein folding determines long term tendencies that emerge at the evolutionary time-scale.Thus, protein evolution is a consequence of measurements executed by proteins at the cellular level, where the IGUS imposes a tendency to attain a highly unique stable native form that promotes the updating of the information content. The folding kinetics observed is, thus, the outcome of an evolutionary process where the polypeptide-IGUS drives the evolution of its linear sequence. Finally, we describe protein evolution as an entropic process that tends to increase the content of mutual algorithmic information between the sequence and the structure. This model enables one: 1. To comprehend that full determination of the three-dimensional structure by the linear sequence is a tendency where satisfaction is only possible at thermodynamic equilibrium.2. To account for the observed randomness of the amino acid sequences. 3. To predict an alternation of periods of selection and neutral diffusion during protein evolutionary time.  相似文献   

5.
Lisewski AM 《PloS one》2008,3(9):e3110
The transmission of genomic information from coding sequence to protein structure during protein synthesis is subject to stochastic errors. To analyze transmission limits in the presence of spurious errors, Shannon's noisy channel theorem is applied to a communication channel between amino acid sequences and their structures established from a large-scale statistical analysis of protein atomic coordinates. While Shannon's theorem confirms that in close to native conformations information is transmitted with limited error probability, additional random errors in sequence (amino acid substitutions) and in structure (structural defects) trigger a decrease in communication capacity toward a Shannon limit at 0.010 bits per amino acid symbol at which communication breaks down. In several controls, simulated error rates above a critical threshold and models of unfolded structures always produce capacities below this limiting value. Thus an essential biological system can be realistically modeled as a digital communication channel that is (a) sensitive to random errors and (b) restricted by a Shannon error limit. This forms a novel basis for predictions consistent with observed rates of defective ribosomal products during protein synthesis, and with the estimated excess of mutual information in protein contact potentials.  相似文献   

6.
Shannon information is commonly assumed to be the wrong way in which to conceive of information in most biological contexts. Since the theory deals only in correlations between systems, the argument goes, it can apply to any and all causal interactions that affect a biological outcome. Since informational language is generally confined to only certain kinds of biological process, such as gene expression and hormone signalling, Shannon information is thought to be unable to account for this restriction. It is often concluded that a richer, teleosemantic sense of information is needed. I argue against this view, and show that a coherent and sufficiently restrictive theory of biological information can be constructed with Shannon information at its core. This can be done by paying due attention some crucial distinctions: between information quantity and its fitness value, and between carrying information and having the function of doing so. From this I construct an account of how informational functions arise, and show that the “subject matter” of these functions can easily be seen as the natural information dealt with by Shannon’s theory.  相似文献   

7.
Comparative analysis of nonhuman animal communication systems and their complexity, particularly in comparison to human language, has been generally hampered by both a lack of sufficiently extensive data sets and appropriate analytic tools. Information theory measures provide an important quantitative tool for examining and comparing communication systems across species. In this paper we use the original application of information theory, that of statistical examination of a communication system's structure and organization. As an example of the utility of information theory to the analysis of animal communication systems, we applied a series of information theory statistics to a statistically categorized set of bottlenose dolphin Tursiops truncatus, whistle vocalizations. First, we use the first-order entropic relation in a Zipf-type diagram (Zipf 1949 Human Behavior and the Principle of Least Effort) to illustrate the application of temporal statistics as comparative indicators of repertoire complexity, and as possible predictive indicators of acquisition/learning in animal vocal repertoires. Second, we illustrate the need for more extensive temporal data sets when examining the higher entropic orders, indicative of higher levels of internal informational structure, of such vocalizations, which could begin to allow the statistical reconstruction of repertoire organization. Third, we propose using 'communication capacity' as a measure of the degree of temporal structure and complexity of statistical correlation, represented by the values of entropic order, as an objective tool for interspecies comparison of communication complexity. In doing so, we introduce a new comparative measure, the slope of Shannon entropies, and illustrate how it potentially can be used to compare the organizational complexity of vocal repertoires across a diversity of species. Finally, we illustrate the nature and predictive application of these higher-order entropies using a preliminary sample of dolphin whistle vocalizations. The purpose of this preliminary report is to re-examine the original application of information theory to the field of animal communication, illustrate its potential utility as a comparative tool for examining the internal informational structure of animal vocal repertoires and their development, and discuss its relationship to behavioural ecology and evolutionary theory. Copyright 1999 The Association for the Study of Animal Behaviour.  相似文献   

8.
Shannon’s seminal approach to estimating information capacity is widely used to quantify information processing by biological systems. However, the Shannon information theory, which is based on power spectrum estimation, necessarily contains two sources of error: time delay bias error and random error. These errors are particularly important for systems with relatively large time delay values and for responses of limited duration, as is often the case in experimental work. The window function type and size chosen, as well as the values of inherent delays cause changes in both the delay bias and random errors, with possibly strong effect on the estimates of system properties. Here, we investigated the properties of these errors using white-noise simulations and analysis of experimental photoreceptor responses to naturalistic and white-noise light contrasts. Photoreceptors were used from several insect species, each characterized by different visual performance, behavior, and ecology. We show that the effect of random error on the spectral estimates of photoreceptor performance (gain, coherence, signal-to-noise ratio, Shannon information rate) is opposite to that of the time delay bias error: the former overestimates information rate, while the latter underestimates it. We propose a new algorithm for reducing the impact of time delay bias error and random error, based on discovering, and then using that size of window, at which the absolute values of these errors are equal and opposite, thus cancelling each other, allowing minimally biased measurement of neural coding.  相似文献   

9.
10.
11.
Here I systematically examine the information complexity of all primary sequences of natural proteins deposited in the Swiss-Prot database. The sequence complexity is assessed by determining the frequency of occurrence of each amino acid type on sequence windows of fixed length, calculating the Shannon entropy of the window and then averaging over all windows covering the sequence. The minimum value in information content obtained from the present-day record imposes a lower limit in the number of letters that a primeval amino acid alphabet must have had.  相似文献   

12.
The successive expression of neuronal transients is related to dynamic correlations and, as shown in this paper, to dynamic instability. Dynamic instability is a form of complexity, typical of neuronal systems, which may be crucial for adaptive brain function from two perspectives. The first is from the point of view of neuronal selection and self-organizing systems: if selective mechanisms underpin the emergence of adaptive neuronal responses then dynamic instability is, itself, necessarily adaptive. This is because dynamic instability is the source of diversity on which selection acts and is therefore subject to selective pressure. In short, the emergence of order, through selection, depends almost paradoxically on the instabilities that characterize the diversity of brain dynamics. The second perspective is provided by information theory.  相似文献   

13.
Sociality is primarily a coordination problem. However, the social (or communication) complexity hypothesis suggests that the kinds of information that can be acquired and processed may limit the size and/or complexity of social groups that a species can maintain. We use an agent-based model to test the hypothesis that the complexity of information processed influences the computational demands involved. We show that successive increases in the kinds of information processed allow organisms to break through the glass ceilings that otherwise limit the size of social groups: larger groups can only be achieved at the cost of more sophisticated kinds of information processing that are disadvantageous when optimal group size is small. These results simultaneously support both the social brain and the social complexity hypotheses.  相似文献   

14.
15.
Biologists rely heavily on the language of information, coding, and transmission that is commonplace in the field of information theory developed by Claude Shannon, but there is open debate about whether such language is anything more than facile metaphor. Philosophers of biology have argued that when biologists talk about information in genes and in evolution, they are not talking about the sort of information that Shannon’s theory addresses. First, philosophers have suggested that Shannon’s theory is only useful for developing a shallow notion of correlation, the so-called “causal sense” of information. Second, they typically argue that in genetics and evolutionary biology, information language is used in a “semantic sense,” whereas semantics are deliberately omitted from Shannon’s theory. Neither critique is well-founded. Here we propose an alternative to the causal and semantic senses of information: a transmission sense of information, in which an object X conveys information if the function of X is to reduce, by virtue of its sequence properties, uncertainty on the part of an agent who observes X. The transmission sense not only captures much of what biologists intend when they talk about information in genes, but also brings Shannon’s theory back to the fore. By taking the viewpoint of a communications engineer and focusing on the decision problem of how information is to be packaged for transport, this approach resolves several problems that have plagued the information concept in biology, and highlights a number of important features of the way that information is encoded, stored, and transmitted as genetic sequence.  相似文献   

16.

Background

Existing sequence alignment algorithms use heuristic scoring schemes based on biological expertise, which cannot be used as objective distance metrics. As a result one relies on crude measures, like the p- or log-det distances, or makes explicit, and often too simplistic, a priori assumptions about sequence evolution. Information theory provides an alternative, in the form of mutual information (MI). MI is, in principle, an objective and model independent similarity measure, but it is not widely used in this context and no algorithm for extracting MI from a given alignment (without assuming an evolutionary model) is known. MI can be estimated without alignments, by concatenating and zipping sequences, but so far this has only produced estimates with uncontrolled errors, despite the fact that the normalized compression distance based on it has shown promising results.

Results

We describe a simple approach to get robust estimates of MI from global pairwise alignments. Our main result uses algorithmic (Kolmogorov) information theory, but we show that similar results can also be obtained from Shannon theory. For animal mitochondrial DNA our approach uses the alignments made by popular global alignment algorithms to produce MI estimates that are strikingly close to estimates obtained from the alignment free methods mentioned above. We point out that, due to the fact that it is not additive, normalized compression distance is not an optimal metric for phylogenetics but we propose a simple modification that overcomes the issue of additivity. We test several versions of our MI based distance measures on a large number of randomly chosen quartets and demonstrate that they all perform better than traditional measures like the Kimura or log-det (resp. paralinear) distances.

Conclusions

Several versions of MI based distances outperform conventional distances in distance-based phylogeny. Even a simplified version based on single letter Shannon entropies, which can be easily incorporated in existing software packages, gave superior results throughout the entire animal kingdom. But we see the main virtue of our approach in a more general way. For example, it can also help to judge the relative merits of different alignment algorithms, by estimating the significance of specific alignments. It strongly suggests that information theory concepts can be exploited further in sequence analysis.  相似文献   

17.
Daniel R. Brooks and E. O. Wiley have proposed a theory of evolution in which fitness is merely a rate determining factor. Evolution is driven by non-equilibrium processes which increase the entropy and information content of species together. Evolution can occur without environmental selection, since increased complexity and organization result from the likely capture at the species level of random variations produced at the chemical level. Speciation can occur as the result of variation within the species which decreases the probability of sharing genetic information. Critics of the Brooks-Wiley theory argue that they have abused terminology from information theory and t thermodynamics. In this paper I review the essentials of the theory, and give an account of hierarchical physical information systems within which the theory can be interpreted. I then show how the major conceptual objections can be answered.  相似文献   

18.
Multiple sequence alignment using partial order graphs   总被引:14,自引:0,他引:14  
MOTIVATION: Progressive Multiple Sequence Alignment (MSA) methods depend on reducing an MSA to a linear profile for each alignment step. However, this leads to loss of information needed for accurate alignment, and gap scoring artifacts. RESULTS: We present a graph representation of an MSA that can itself be aligned directly by pairwise dynamic programming, eliminating the need to reduce the MSA to a profile. This enables our algorithm (Partial Order Alignment (POA)) to guarantee that the optimal alignment of each new sequence versus each sequence in the MSA will be considered. Moreover, this algorithm introduces a new edit operator, homologous recombination, important for multidomain sequences. The algorithm has improved speed (linear time complexity) over existing MSA algorithms, enabling construction of massive and complex alignments (e.g. an alignment of 5000 sequences in 4 h on a Pentium II). We demonstrate the utility of this algorithm on a family of multidomain SH2 proteins, and on EST assemblies containing alternative splicing and polymorphism. AVAILABILITY: The partial order alignment program POA is available at http://www.bioinformatics.ucla.edu/poa.  相似文献   

19.
What is complexity?   总被引:1,自引:0,他引:1  
Arguments for or against a trend in the evolution of complexity are weakened by the lack of an unambiguous definition of complexity. Such definitions abound for both dynamical systems and biological organisms, but have drawbacks of either a conceptual or a practical nature. Physical complexity, a measure based on automata theory and information theory, is a simple and intuitive measure of the amount of information that an organism stores, in its genome, about the environment in which it evolves. It is argued that physical complexity must increase in molecular evolution of asexual organisms in a single niche if the environment does not change, due to natural selection. It is possible that complexity decreases in co-evolving systems as well as at high mutation rates, in sexual populations, and in time-dependent landscapes. However, it is reasoned that these factors usually help, rather than hinder, the evolution of complexity, and that a theory of physical complexity for co-evolving species will reveal an overall trend towards higher complexity in biological evolution.  相似文献   

20.
Pino S  Costanzo G  Giorgi A  Di Mauro E 《Biochemistry》2011,50(14):2994-3003
We report two reactions of RNA G:C sequences occurring nonenzymatically in water in the absence of any added cofactor or metal ion: (a) sequence complementarity-driven terminal ligation and (b) complementary sequence adaptor-driven multiple tandemization. The two abiotic reactions increase the chemical complexity of the resulting pool of RNA molecules and change the Shannon information of the initial population of sequences.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号