共查询到20条相似文献,搜索用时 0 毫秒
1.
Diana Ekman 《Journal of molecular biology》2010,396(2):396-405
For large regions of many proteins, and even entire proteins, no homology to known domains or proteins can be detected. These sequences are often referred to as orphans. Surprisingly, it has been reported that the large number of orphans is sustained in spite of a rapid increase of available genomic sequences. However, it is believed that de novo creation of coding sequences is rare in comparison to mechanisms such as domain shuffling and gene duplication; hence, most sequences should have homologs in other genomes.To investigate this, the sequences of 19 complete fungi genomes were compared. By using the phylogenetic relationship between these genomes, we could identify potentially de novo created orphans in Saccharomyces cerevisiae. We found that only a small fraction, < 2%, of the S. cerevisiae proteome is orphan, which confirms that de novo creation of coding sequences is indeed rare. Furthermore, we found it necessary to compare the most closely related species to distinguish between de novo created sequences and rapidly evolving sequences where homologs are present but cannot be detected.Next, the orphan proteins (OPs) and orphan domains (ODs) were characterized. First, it was observed that both OPs and ODs are short. In addition, at least some of the OPs have been shown to be functional in experimental assays, showing that they are not pseudogenes. Furthermore, in contrast to what has been reported before and what is seen for older orphans, S. cerevisiae specific ODs and proteins are not more disordered than other proteins. This might indicate that many of the older, and earlier classified, orphans indeed are fast-evolving sequences. Finally, > 90% of the detected ODs are located at the protein termini, which suggests that these orphans could have been created by mutations that have affected the start or stop codons. 相似文献
2.
Maria A. Korotkova Eugene V. Korotkov Valentina M. Rudenko 《Journal of molecular modeling》1999,5(6):103-115
This article is in the area of protein sequence investigation. It studies protein sequence periodicity. The notion of latent periodicity is introduced. A mathematical method for searching for latent periodicity in protein sequences is developed. Implementation of the method developed for known cases of perfect and imperfect periodicity is demonstrated. Latent periodicity of many protein sequences from the SWISS-PROT data bank is revealed by the method and examples of latent periodicity of amino acid sequences are demonstrated for: the translation initiation factor EIF-2B (epsilon subunit) of Saccharomyces cerevisiae from the E2BE_YEAST sequence; the E.coli ferrienterochelin receptor from the FEPA_ECOLI sequence; the lysozyme of Bacteriophage SF6 from the LY_BPSF6 sequence; lipoamide dehydrogenase of Azotobacter vinelandii from the DLDH_AZOVI sequence. These protein sequences have latent periods equal to six, two, seven and 19 amino acids, respectively. We propose that a possible purpose of the amino acid sequence latent periodicity is to determine certain protein structures. 相似文献
3.
Abstract A new approach using a 3-D Cartesian coordinate system to represent protein sequences has been derived. By the 3-D Graphical representation we make a comparison of sequences belonging to nine different proteins. 相似文献
4.
Abstract TATA-box binding protein (TBP) in a monomelic form and the complexes it forms with DNA have been elucidated with molecular dynamics simulations. Large TBP domain motions (bend and twist) are detected in the monomer as well as in the DNA complexes; these motions can be important for TBP binding of DNA. TBP interacts with guanine bases flanking the TATA element in the simulations of the complex; these interactions may explain the preference for guanine observed at these DNA positions. Side chains of some TBP residues at the binding interface display significant dynamic flexibility that results in ‘flipflop’ contacts involving multiple base pairs of the DNA. We discuss the possible functional significance of these observations. 相似文献
5.
本文考察了目前采用的估计同源蛋白质序列间进化距离的方法缺陷,并提出了几个新的计算公式,它们考虑了氨基酸位点间显然存在的替代速率的差异。另外,提出了一种考虑氨基酸间不同替代概率的最大似然估计方法。文中对这些公式进行了计算比较,并对它在实际中的运用提出了建议。 相似文献
6.
Xiaoli Zhang Xiaohong Yin Jiana Chen Fangbo Cao Yu Liu Zhengwu Xiao Liqin Hu Guanghui Chen Tianfeng Liang Min Huang 《Phyton》2021,90(4):1285-1292
Protein in rice grains is an important source of nutrition for rice consumers. This study mainly aimed to identify the critical factors that determine grain protein concentration in rice. Accumulation parameters, including mean accumulation rate (Rmean) and active accumulation duration (Dactive), for protein and non-protein components and their correlations with protein concentration in rice grains were investigated in field experiments conducted over two years with six rice cultivars. Results showed that grain protein concentration ranged from 9.6% to 11.9% across cultivars and years. Accumulation processes of protein and non-protein components were well fitted by the logistic equation for all six rice cultivars in both years, and the ratio of protein to non-protein for Rmean and Dactive ranged from 0.08 to 0.12 and 1.01 to 1.33, respectively. Grain protein concentration was significantly correlated with protein to non-protein ratio for Rmean. This study suggests that grain protein concentration is not solely determined by the accumulation of protein or non-protein component, but by the coordination of protein and non-protein accumulation. 相似文献
7.
Regions of differing constraint, mutation rate or recombination along a sequence of DNA or amino acids lead to a nonuniform distribution of polymorphism within species or fixed differences between species. The power of five tests to reject the null hypothesis of a uniform distribution is studied for four classes of alternate hypothesis. The tests explored are the variance of interval lengths; a modified variance test, which includes covariance between neighboring intervals; the length of the longest interval; the length of the shortest third-order interval; and a composite test. Although there is no uniformly most powerful test over the range of alternate hypotheses tested, the variance and modified variance tests usually have the highest power. Therefore, we recommend that one of these two tests be used to test departure from uniformity in all circumstances. Tables of critical values for the variance and modified variance tests are given. The critical values depend both on the number of events and the number of positions in the sequence. A computer program is available on request that calculates both the critical values for a specified number of events and number of positions as well as the significance level of a given data set. 相似文献
8.
Linear motifs mediate a wide variety of cellular functions, which makes their characterization in protein sequences crucial to understanding cellular systems. However, the short length and degenerate nature of linear motifs make their discovery a difficult problem. Here, we introduce MotifHound, an algorithm particularly suited for the discovery of small and degenerate linear motifs. MotifHound performs an exact and exhaustive enumeration of all motifs present in proteins of interest, including all of their degenerate forms, and scores the overrepresentation of each motif based on its occurrence in proteins of interest relative to a background (e.g., proteome) using the hypergeometric distribution. To assess MotifHound, we benchmarked it together with state-of-the-art algorithms. The benchmark consists of 11,880 sets of proteins from S. cerevisiae; in each set, we artificially spiked-in one motif varying in terms of three key parameters, (i) number of occurrences, (ii) length and (iii) the number of degenerate or “wildcard” positions. The benchmark enabled the evaluation of the impact of these three properties on the performance of the different algorithms. The results showed that MotifHound and SLiMFinder were the most accurate in detecting degenerate linear motifs. Interestingly, MotifHound was 15 to 20 times faster at comparable accuracy and performed best in the discovery of highly degenerate motifs. We complemented the benchmark by an analysis of proteins experimentally shown to bind the FUS1 SH3 domain from S. cerevisiae. Using the full-length protein partners as sole information, MotifHound recapitulated most experimentally determined motifs binding to the FUS1 SH3 domain. Moreover, these motifs exhibited properties typical of SH3 binding peptides, e.g., high intrinsic disorder and evolutionary conservation, despite the fact that none of these properties were used as prior information. MotifHound is available (http://michnick.bcm.umontreal.ca or http://tinyurl.com/motifhound) together with the benchmark that can be used as a reference to assess future developments in motif discovery. 相似文献
9.
Decomposing a biological sequence into its functional regions is an important prerequisite to understand the molecule. Using the multiple alignments of the sequences, we evaluate a segmentation based on the type of statistical variation pattern from each of the aligned sites. To describe such a more general pattern, we introduce multipattern consensus regions as segmented regions based on conserved as well as interdependent patterns. Thus the proposed consensus region considers patterns that are statistically significant and extends a local neighborhood. To show its relevance in protein sequence analysis, a cancer suppressor gene called p53 is examined. The results show significant associations between the detected regions and tendency of mutations, location on the 3D structure, and cancer hereditable factors that can be inferred from human twin studies.[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27] 相似文献
10.
The reduction of nitrate in leaves establishes an ‘excess’ of cations in the tissue which is rapidly neutralized by the synthesis of malate. Malate synthesized in nitrate-containing leaves accumulates while malate synthesized in nitrate-depleted leaves disappears gradually. The levels of nitrate reductasc and malate synthesis change similarly during leaf growth. Good correlation was also found between the rate of protein synthesis and the rale of malate synthesis. Stoichiometric relation between the amount of nitrate reduced and malate accumulated may be observed in short-term experiments. 相似文献
11.
12.
利用Fourier乘积谱(Fourier product spectrum,FPS)系统计算了76个蛋白质亚家族的疏水自由能(HΦ)序列、电子离子相互作用势(electron-ion interaction potential,EIIP)序列和结构拓扑指数(Balaban指数)序列的特征周期;讨论了特征周期的规律;并以时间周期谱(time periodicity spectrum,TPS)演示了如何定位特征周期在蛋白质序列上的热区.结果表明EIIP特征周期热区与蛋白质结合位点有较好的对应关系. 相似文献
13.
14.
Few sequence alignment methods have been designed specifically for integral membrane proteins, even though these important proteins have distinct evolutionary and structural properties that might affect their alignments. Existing approaches typically consider membrane-related information either by using membrane-specific substitution matrices or by assigning distinct penalties for gap creation in transmembrane and non-transmembrane regions. Here, we ask whether favoring matching of predicted transmembrane segments within a standard dynamic programming algorithm can improve the accuracy of pairwise membrane protein sequence alignments. We tested various strategies using a specifically designed program called AlignMe. An updated set of homologous membrane protein structures, called HOMEP2, was used as a reference for optimizing the gap penalties. The best of the membrane-protein optimized approaches were then tested on an independent reference set of membrane protein sequence alignments from the BAliBASE collection. When secondary structure (S) matching was combined with evolutionary information (using a position-specific substitution matrix (P)), in an approach we called AlignMePS, the resultant pairwise alignments were typically among the most accurate over a broad range of sequence similarities when compared to available methods. Matching transmembrane predictions (T), in addition to evolutionary information, and secondary-structure predictions, in an approach called AlignMePST, generally reduces the accuracy of the alignments of closely-related proteins in the BAliBASE set relative to AlignMePS, but may be useful in cases of extremely distantly related proteins for which sequence information is less informative. The open source AlignMe code is available at https://sourceforge.net/projects/alignme/, and at http://www.forrestlab.org, along with an online server and the HOMEP2 data set. 相似文献
15.
S-glutathionylation, the reversible formation of mixed disulfides between glutathione(GSH) and cysteine residues in proteins, is a specific form of post-translational modification that plays important roles in various biological processes, including signal transduction, redox homeostasis, and metabolism inside cells. Experimentally identifying S-glutathionylation sites is labor-intensive and time consuming, whereas bioinformatics methods provide an alternative way to this problem by predicting S-glutathionylation sites in silico. The bioinformatics approaches give not only candidate sites for further experimental verification but also bio-chemical insights into the mechanism of S-glutathionylation. In this paper, we firstly collect experimentally determined S-glutathionylated proteins and their corresponding modification sites from the literature, and then propose a new method for predicting S-glutathionylation sites by employing machine learning methods based on protein sequence data. Promising results are obtained by our method with an AUC (area under ROC curve) score of 0.879 in 5-fold cross-validation, which demonstrates the predictive power of our proposed method. The datasets used in this work are available at http://csb.shu.edu.cn/SGDB. 相似文献
16.
Over repeat presentations of the same stimulus, sensory neurons show variable responses. This “noise” is typically correlated between pairs of cells, and a question with rich history in neuroscience is how these noise correlations impact the population''s ability to encode the stimulus. Here, we consider a very general setting for population coding, investigating how information varies as a function of noise correlations, with all other aspects of the problem – neural tuning curves, etc. – held fixed. This work yields unifying insights into the role of noise correlations. These are summarized in the form of theorems, and illustrated with numerical examples involving neurons with diverse tuning curves. Our main contributions are as follows. (1) We generalize previous results to prove a sign rule (SR) — if noise correlations between pairs of neurons have opposite signs vs. their signal correlations, then coding performance will improve compared to the independent case. This holds for three different metrics of coding performance, and for arbitrary tuning curves and levels of heterogeneity. This generality is true for our other results as well. (2) As also pointed out in the literature, the SR does not provide a necessary condition for good coding. We show that a diverse set of correlation structures can improve coding. Many of these violate the SR, as do experimentally observed correlations. There is structure to this diversity: we prove that the optimal correlation structures must lie on boundaries of the possible set of noise correlations. (3) We provide a novel set of necessary and sufficient conditions, under which the coding performance (in the presence of noise) will be as good as it would be if there were no noise present at all. 相似文献
17.
Abstract In this paper, we propose a nongraphical representation for protein secondary structures. By counting the frequency of occurrence of all possible four-tuples (i.e., four-letter words) of a protein secondary structure sequence, we construct a set of 3 × 3 matrices for the corresponding protein secondary structure sequence. Furthermore, the leading eigenvalues of these matrices are computed and considered as invariants for the protein secondary structure sequences. To illustrate the utility of our approach, we apply it to a set of real data to distinguish protein structural classes. The result indicates that it can be used to complement the classification of protein secondary structures. 相似文献
18.
Nodamura Virus (NoV) is a nodavirus originally isolated from insects that can replicate in a wide variety of hosts, including mammals. Because of their simplicity and ability to replicate in many diverse hosts, NoV, and the Nodaviridae in general, provide a unique window into the evolution of viruses and host-virus interactions. Here we show that the C-terminus of the viral polymerase exhibits extreme structural and evolutionary flexibility. Indeed, fewer than 10 positively charged residues from the 110 amino acid-long C-terminal region of protein A are required to support RNA1 replication. Strikingly, this region can be replaced by completely unrelated protein sequences, yet still produce a functional replicase. Structure predictions, as well as evolutionary and mutational analyses, indicate that the C-terminal region is structurally disordered and evolves faster than the rest of the viral proteome. Thus, the function of an intrinsically unstructured protein region can be independent of most of its primary sequence, conferring both functional robustness and sequence plasticity on the protein. Our results provide an experimental explanation for rapid evolution of unstructured regions, which enables an effective exploration of the sequence space, and likely function space, available to the virus. 相似文献
19.
Abstract We present here the results obtained by applying several different methods to quantitatively measure regularities in protein sequences based on pair-preferences. We have studied the distribution of amino acid residues, singly as well as in pairs in a large data base and have attempted this task. We confirmed the existence of well-defined pair-preferences in proteins which were shown to be remarkably absent in simulated random sequences of similar amino acid distribution. The analysis of the sequences from the SWISS-PROT data base using simple statistical tests, Fourier analysis, fractal analysis and statistical thermodynamical tests were used to derive parameters to define a natural sequence. As a consequence of the existence of pair-preferences, parameters like fractal dimension (D), spectral exponent (β), scaling parameter (H) and entropy (statistical) were found to be characteristic for natural sequences. For a reference state we chose a randomised state devoid of any pair-preference. The pair-preferences qualified well to be used as quantitative measures of regularities in protein sequences. 相似文献
20.
为了更多地挖掘隐藏在蛋白质序列中的信息,本研究将20种氨基酸均匀地排列在单位圆周上,得到每种氨基酸对应的二维坐标,再与氨基酸的6个理化指标结合起来,最终用一个八维向量来刻画蛋白质序列。为避免数据极差对分析结果造成的影响,本研究对蛋白质序列所对应的八维向量作归一化处理。基于归一化后的蛋白质序列的向量表示,运用神经网络对蛋白质序列进行分类,并根据向量之间的欧式距离来量化序列之间的相似性。最后,以9个不同物种的ND5蛋白质序列以及8个不同物种的ND6蛋白质序列为例,Clustal W序列比对方法为基准,对本研究的方法与5-字母方法进行验证和比较,结果表明本研的方法是有效的。 相似文献