首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A genome of a living organism consists of a long string of symbols over a finite alphabet carrying critical information for the organism. This includes its ability to control post natal growth, homeostasis, adaptation to changes in the surrounding environment, or to biochemically respond at the cellular level to various specific regulatory signals. In this sense, a genome represents a symbolic encoding of a highly organized system of information whose functioning may be revealed as a natural multilayer structure in terms of complexity and prominence. In this paper we use the mathematical theory of symbolic extensions as a framework to shed light onto how this multilayer organization is reflected in the symbolic coding of the genome. The distribution of data in an element of a standard symbolic extension of a dynamical system has a specific form: the symbolic sequence is divided into several subsequences (which we call layers) encoding the dynamics on various “scales”. We propose that a similar structure resides within the genomes, building our analogy on some of the most recent findings in the field of regulation of genomic DNA functioning.  相似文献   

2.
Protein sequence world is considerably larger than structure world. In consequence, numerous non-related sequences may adopt similar 3D folds and different kinds of amino acids may thus be found in similar 3D structures. By grouping together the 20 amino acids into a smaller number of representative residues with similar features, sequence world simplification may be achieved. This clustering hence defines a reduced amino acid alphabet (reduced AAA). Numerous works have shown that protein 3D structures are composed of a limited number of building blocks, defining a structural alphabet. We previously identified such an alphabet composed of 16 representative structural motifs (5-residues length) called Protein Blocks (PBs). This alphabet permits to translate the structure (3D) in sequence of PBs (1D). Based on these two concepts, reduced AAA and PBs, we analyzed the distributions of the different kinds of amino acids and their equivalences in the structural context. Different reduced sets were considered. Recurrent amino acid associations were found in all the local structures while other were specific of some local structures (PBs) (e.g Cysteine, Histidine, Threonine and Serine for the alpha-helix Ncap). Some similar associations are found in other reduced AAAs, e.g Ile with Val, or hydrophobic aromatic residues Trp with Phe and Tyr. We put into evidence interesting alternative associations. This highlights the dependence on the information considered (sequence or structure). This approach, equivalent to a substitution matrix, could be useful for designing protein sequence with different features (for instance adaptation to environment) while preserving mainly the 3D fold.  相似文献   

3.
Consider the scenario of common gene clusters of closely related species where the cluster sizes could be as large as 400 from an alphabet of 25,000 genes. This paper addresses the problem of computing the statistical significance of such large clusters, whose individual elements occur with very low frequency (of the order of the number of species in this case) and the alphabet set of the elements is relatively large. We present a model where we study the structure of the clusters in terms of smaller nested (or otherwise) sub-clusters contained within the cluster. We give a probability estimation based on the expected cluster structure for such clusters (rather than some form of the product of individual probabilities of the elements). We also give an exact probability computation based on a dynamic programming algorithm, which runs in polynomial time.  相似文献   

4.
Braille reading is a complex process involving intricate finger-motion patterns and finger-rubbing actions across Braille letters for the stimulation of appropriate nerves. Although Braille reading is performed by smoothly moving the finger from left-to-right, research shows that even fluent reading requires right-to-left movements of the finger, known as “reversal”. Reversals are crucial as they not only enhance stimulation of nerves for correctly reading the letters, but they also show one to re-read the letters that were missed in the first pass. Moreover, it is known that reversals can be performed as often as in every sentence and can start at any location in a sentence. Here, we report experimental results on the feasibility of an algorithm that can render a machine to automatically adapt to reversal gestures of one’s finger. Through Braille-reading-analogous tasks, the algorithm is tested with thirty sighted subjects that volunteered in the study. We find that the finger motion adaptive algorithm (FMAA) is useful in achieving cooperation between human finger and the machine. In the presence of FMAA, subjects’ performance metrics associated with the tasks have significantly improved as supported by statistical analysis. In light of these encouraging results, preliminary experiments are carried out with five blind subjects with the aim to put the algorithm to test. Results obtained from carefully designed experiments showed that subjects’ Braille reading accuracy in the presence of FMAA was more favorable then when FMAA was turned off. Utilization of FMAA in future generation Braille reading devices thus holds strong promise.  相似文献   

5.

Background  

We investigate automated and generic alphabet reduction techniques for protein structure prediction datasets. Reducing alphabet cardinality without losing key biochemical information opens the door to potentially faster machine learning, data mining and optimization applications in structural bioinformatics. Furthermore, reduced but informative alphabets often result in, e.g., more compact and human-friendly classification/clustering rules. In this paper we propose a robust and sophisticated alphabet reduction protocol based on mutual information and state-of-the-art optimization techniques.  相似文献   

6.
Experiments have shown that the canonical AUCG genetic alphabet is not the only possible nucleotide alphabet. In this work we address the question ''is the canonical alphabet optimal?'' We make the assumption that the genetic alphabet was determined in the RNA world. Computational tools are used to infer the RNA secondary structure (shape) from a given RNA sequence, and statistics from RNA shapes are gathered with respect to alphabet size. Then, simulations based upon the replication and selection of fixed-sized RNA populations are used to investigate the effect of alternative alphabets upon RNA''s ability to step through a fitness landscape. These results show that for a low copy fidelity the canonical alphabet is fitter than two-, six- and eight-letter alphabets. In higher copy-fidelity experiments, six-letter alphabets outperform the four-letter alphabets, suggesting that the canonical alphabet is indeed a relic of the RNA world.  相似文献   

7.
This paper is divided into two parts. Part I focuses on the manner in which the components of the face recognition system work together so that a perceiver, within several hundred milliseconds after seeing a familiar face, is able to both identify the face of the perceived and recall elements of the history of past encounters with the perceived. Face recognition plays a crucial role in enabling both human and nonhuman primates to interact in collaborative social groups. This critical function is accomplished through the unidirectional coded transfer of informational elements from one component to another. Although these informational elements themselves are not meaningful to the perceiving agent, they do nevertheless contain essential bits of information that are necessary for the final formation of the meaningful message. The structural components of the system are identified and the manner in which informational elements are coded and transferred sequentially from component to component in the brain of the perceiver is described. The independent, physically separated components in the face recognition system are bridged by an additional component, an “adaptor”, that mediates the transfer of informational elements from one component to another. The nature of the independent systems, and the manner by which the bridging or adaptor apparatus enables coded information transfer from one system to another is discussed. Part II focuses on the analysis of recognition in human-designed sign systems such as Braille and Morse code. Recognition in human-designed sign systems is notable for the stability of the link between sign and meaning. Face recognition is characterized as being subjective, indicating that the meaning of a sign (face) to a perceiver is variable and dependent on context, whereas human-devised sign recognition is characterized as being objective, indicating that the meaning of a sign is context independent and invariant. Human-designed sign systems require the presence in brain of a referent world. An example of a referent world is the set of letters of the alphabet. Representations of this set are installed in the brain through social mediated learning. Human-designed sets of signs (e.g., Braille, and written text) are created to correspond, via a code enabling adaptor structure, to referent worlds in the brain. Human-designed sign systems are the foundations for literacy, a capability only found in humans.  相似文献   

8.
9.
10.
Dokholyan NV 《Proteins》2004,54(4):622-628
Selecting a protein sequence that corresponds to a specific three-dimensional protein structure is known as the protein design problem. One principal bottleneck in solving this problem is our lack of knowledge of precise atomic interactions. Using a simple model of amino acid interactions, we determine three crucial factors that are important for solving the protein design problem. Among these factors is the protein alphabet-a set of sequence elements that encodes protein structure. Our model predicts that alphabet size is independent of protein length, suggesting the possibility of designing a protein of arbitrary length with the natural protein alphabet. We also find that protein alphabet size is governed by protein structural properties and the energetic properties of the protein alphabet units. We discover that the usage of average types of amino acid in proteins is less than expected if amino acids were chosen randomly with naturally occurring frequencies. We propose three possible scenarios that account for amino acid underusage in proteins. These scenarios suggest the possibility that amino acids themselves might not constitute the alphabet of natural proteins.  相似文献   

11.
We describe immune-proteome structures using libraries of protein fragments that define a structural immunological alphabet. We propose and validate such an alphabet as i) composed of letters of five consecutive amino acids, pentapeptide units being sufficient minimal antigenic determinants in a protein, and ii) characterized by low-similarity to human proteins, so representing structures unknown to the host and potentially able to evoke an immune response. In this context, we have thoroughly sifted through the entire human proteome searching for non-redundant protein motifs. Here, for the first time, a complete sequence redundancy dissection of the human proteome has been conducted. The non-redundant peptide islands in the human proteome have been quantified and catalogued according to the amino acid length. The library of uniquely occurring n-peptide sequences that was obtained is characterized by a logarithmic decrease of the number of non-redundant peptides as a function of the peptide length. This library represents a highly specific catalogue of molecular protein signatures, the possible use of which in cancer/autoimmunity research is discussed, with a major focus on non-redundant dodecamer sequences.  相似文献   

12.
What are the key building blocks that would have been needed to construct complex protein folds? This is an important issue for understanding protein folding mechanism and guiding de novo protein design. Twenty naturally occurring amino acids and eight secondary structures consist of a 28‐letter alphabet to determine folding kinetics and mechanism. Here we predict folding kinetic rates of proteins from many reduced alphabets. We find that a reduced alphabet of 10 letters achieves good correlation with folding rates, close to the one achieved by full 28‐letter alphabet. Many other reduced alphabets are not significantly correlated to folding rates. The finding suggests that not all amino acids and secondary structures are equally important for protein folding. The foldable sequence of a protein could be designed using at least 10 folding units, which can either promote or inhibit protein folding. Reducing alphabet cardinality without losing key folding kinetic information opens the door to potentially faster machine learning and data mining applications in protein structure prediction, sequence alignment and protein design. Proteins 2015; 83:631–639. © 2015 Wiley Periodicals, Inc.  相似文献   

13.
Protein design experiments have shown that the use of specific subsets of amino acids can produce foldable proteins. This prompts the question of whether there is a minimal amino acid alphabet which could be used to fold all proteins. In this work we make an analogy between sequence patterns which produce foldable sequences and those which make it possible to detect structural homologs by aligning sequences, and use it to suggest the possible size of such a reduced alphabet. We estimate that reduced alphabets containing 10-12 letters can be used to design foldable sequences for a large number of protein families. This estimate is based on the observation that there is little loss of the information necessary to pick out structural homologs in a clustered protein sequence database when a suitable reduction of the amino acid alphabet from 20 to 10 letters is made, but that this information is rapidly degraded when further reductions in the alphabet are made.  相似文献   

14.

Background

The question of how the brain encodes letter position in written words has attracted increasing attention in recent years. A number of models have recently been proposed to accommodate the fact that transposed-letter stimuli like jugde or caniso are perceptually very close to their base words.

Methodology

Here we examined how letter position coding is attained in the tactile modality via Braille reading. The idea is that Braille word recognition may provide more serial processing than the visual modality, and this may produce differences in the input coding schemes employed to encode letters in written words. To that end, we conducted a lexical decision experiment with adult Braille readers in which the pseudowords were created by transposing/replacing two letters.

Principal Findings

We found a word-frequency effect for words. In addition, unlike parallel experiments in the visual modality, we failed to find any clear signs of transposed-letter confusability effects. This dissociation highlights the differences between modalities.

Conclusions

The present data argue against models of letter position coding that assume that transposed-letter effects (in the visual modality) occur at a relatively late, abstract locus.  相似文献   

15.
In this study, we have calculated distances between genomes based on our previously developed compositional spectra (CS) analysis. The study was conducted using genomes of 39 species of Eukarya, Eubacteria, and Archaea. Based on CS distances, we produced two different consensus dendrograms for four- and two-letter (purine-pyrimidine) alphabets. A comparison of the obtained structure using purine-pyrimidine alphabet with the standard three-kingdom (3K) scheme reveals substantial similarity. Surprisingly, this is not the case when the same procedure is based on the four-letter alphabet. In this situation, we also found three main clusters-but different from those in the 3K scheme. In particular, one of the clusters includes Eukarya and thermophilic bacteria and a part of the considered Archaea species. We speculate that the key factor in the last classification (based on the A-T-G-C alphabet) is related to ecology: two ecological parameters, temperature and oxygen, distinctly explain the clustering revealed by compositional spectra in the four-letter alphabet. Therefore, we assume that this result reflects two interdependent processes: evolutionary divergence and superimposed ecological convergence of the genomes, albeit another process, horizontal transfer, cannot be excluded as an important contributing factor.  相似文献   

16.
Finding structural similarities between proteins often helps reveal shared functionality, which otherwise might not be detected by native sequence information alone. Such similarity is usually detected and quantified by protein structure alignment. Determining the optimal alignment between two protein structures, however, remains a hard problem. An alternative approach is to approximate each three-dimensional protein structure using a sequence of motifs derived from a structural alphabet. Using this approach, structure comparison is performed by comparing the corresponding motif sequences or structural sequences. In this article, we measure the performance of such alphabets in the context of the protein structure classification problem. We consider both local and global structural sequences. Each letter of a local structural sequence corresponds to the best matching fragment to the corresponding local segment of the protein structure. The global structural sequence is designed to generate the best possible complete chain that matches the full protein structure. We use an alphabet of 20 letters, corresponding to a library of 20 motifs or protein fragments having four residues. We show that the global structural sequences approximate well the native structures of proteins, with an average coordinate root mean square of 0.69 Å over 2225 test proteins. The approximation is best for all α-proteins, while relatively poorer for all β-proteins. We then test the performance of four different sequence representations of proteins (their native sequence, the sequence of their secondary-structure elements, and the local and global structural sequences based on our fragment library) with different classifiers in their ability to classify proteins that belong to five distinct folds of CATH. Without surprise, the primary sequence alone performs poorly as a structure classifier. We show that addition of either secondary-structure information or local information from the structural sequence considerably improves the classification accuracy. The two fragment-based sequences perform better than the secondary-structure sequence but not well enough at this stage to be a viable alternative to more computationally intensive methods based on protein structure alignment.  相似文献   

17.
18.
基于蛋白质结构字母的预测和分析方法,一个必然的步聚,是将目标蛋白质离散成结构字母序列。本文在对蛋白质结构字母序列空间,及其最小根均方偏差变化,穷举分析的基础上,提出了一种新的蛋白质结构字母序列优化算法,全局贪婪算法。全局贪婪算法避免了基本贪婪算法过度依赖候选集大小,计算量过大、以及过早收缩于局部最小等缺点。经实验分析,全局贪婪算法在性能上优于基本贪婪算法和局部最优方法。。  相似文献   

19.
Armando D. Solis 《Proteins》2015,83(12):2198-2216
To reduce complexity, understand generalized rules of protein folding, and facilitate de novo protein design, the 20‐letter amino acid alphabet is commonly reduced to a smaller alphabet by clustering amino acids based on some measure of similarity. In this work, we seek the optimal alphabet that preserves as much of the structural information found in long‐range (contact) interactions among amino acids in natively‐folded proteins. We employ the Information Maximization Device, based on information theory, to partition the amino acids into well‐defined clusters. Numbering from 2 to 19 groups, these optimal clusters of amino acids, while generated automatically, embody well‐known properties of amino acids such as hydrophobicity/polarity, charge, size, and aromaticity, and are demonstrated to maintain the discriminative power of long‐range interactions with minimal loss of mutual information. Our measurements suggest that reduced alphabets (of less than 10) are able to capture virtually all of the information residing in native contacts and may be sufficient for fold recognition, as demonstrated by extensive threading tests. In an expansive survey of the literature, we observe that alphabets derived from various approaches—including those derived from physicochemical intuition, local structure considerations, and sequence alignments of remote homologs—fare consistently well in preserving contact interaction information, highlighting a convergence in the various factors thought to be relevant to the folding code. Moreover, we find that alphabets commonly used in experimental protein design are nearly optimal and are largely coherent with observations that have arisen in this work. Proteins 2015; 83:2198–2216. © 2015 Wiley Periodicals, Inc.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号