首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A method for comparison of protein sequences based on their primary and secondary structure is described. Protein sequences are annotated with predicted secondary structures (using a modified Chou and Fasman method). Two lettered code sequences are generated (Xx, where X is the amino acid and x is its annotated secondary structure). Sequences are compared with a dynamic programming method (STRALIGN) that includes a similarity matrix for both the amino acids and secondary structures. The similarity value for each paired two-lettered code is a linear combination of similarity values for the paired amino acids and their annotated secondary structures. The method has been applied to eight globin proteins (28 pairs) for which the X-ray structure is known. For protein pairs with high primary sequence similarity (greater than 45%), STRALIGN alignment is identical to that obtained by a dynamic programming method using only primary sequence information. However, alignment of protein pairs with lower primary sequence similarity improves significantly with the addition of secondary structure annotation. Alignment of the pair with the least primary sequence similarity of 16% was improved from 0 to 37% 'correct' alignment using this method. In addition, STRALIGN was successfully applied to seven pairs of distantly related cytochrome c proteins, and three pairs of distantly related picornavirus proteins.  相似文献   

2.
Amino acid substitutions in evolutionarily related proteins have been studied from a structural point of view. We consider here that an amino acid al in a protein p1 has been replaced by the amino acid a2 in the structurally similar protein p2 if, after superposition of the p1 and p2 structures, the a1 and a2 C alpha atoms are no more than 1.2 A apart. Thirty-two proteins, grouped in 11 classes, have been analysed by this method. This produced 2860 amino acid pairs (substitutions), which were analysed by multi-dimensional statistical methods. The main results are as follows: (1) according to the observed exchangeability of amino acid side-chains, only four groups (strong clusters) could be delineated; (i) Ile and Val, (ii) Leu and Met, (iii) Lys, Arg and Gln, and (iv) Tyr and Phe. The other residues could not be classified. (2) The matrix of distances between amino acids, or scoring matrix, determined from this study, is different from any other published matrix. (3) Except for the distance matrices based on the chemical properties of amino acid side-chains, which can be grouped together, all other published matrices are different from one another. (4) The distance matrix determined in this study seems to be very efficient for aligning distantly related protein sequences.  相似文献   

3.
The DNA sequence of the gene which codes for the major outer membrane porin (Omp32) of Comamonas acidovorans has been determined. The structural gene encodes a precursor consisting of 351 amino acid residues with a signal peptide of 19 amino acid residues. Comparisons with amino acid sequences of outer membrane proteins and porins from several other members of the class Proteobacteria and of the Chlamydia trachomatis porin and the Neurospora crassa mitochondrial porin revealed a motif of eight regions of local homology. The results of this analysis are discussed with regard to common structural features of porins.  相似文献   

4.
We present an approach to predicting protein structural class that uses amino acid composition and hydrophobic pattern frequency information as input to two types of neural networks: (1) a three-layer back-propagation network and (2) a learning vector quantization network. The results of these methods are compared to those obtained from a modified Euclidean statistical clustering algorithm. The protein sequence data used to drive these algorithms consist of the normalized frequency of up to 20 amino acid types and six hydrophobic amino acid patterns. From these frequency values the structural class predictions for each protein (all-alpha, all-beta, or alpha-beta classes) are derived. Examples consisting of 64 previously classified proteins were randomly divided into multiple training (56 proteins) and test (8 proteins) sets. The best performing algorithm on the test sets was the learning vector quantization network using 17 inputs, obtaining a prediction accuracy of 80.2%. The Matthews correlation coefficients are statistically significant for all algorithms and all structural classes. The differences between algorithms are in general not statistically significant. These results show that information exists in protein primary sequences that is easily obtainable and useful for the prediction of protein structural class by neural networks as well as by standard statistical clustering algorithms.  相似文献   

5.
Swain MD  Benson DE 《Proteins》2005,59(1):64-71
Protein-derived cofactors that are composed of covalently crosslinked amino acid side chains are of increasing importance in protein science. These crosslinked protein-derived cofactors (CPDC) are formed either through direct oxidation by metal/O(2)-derived intermediates or through outer sphere oxidation by highly oxidizing cofactors. CPDCs that are formed by outer sphere oxidation do not require side-chain precursors to be coordinated by a metal center, and therefore are more difficult to identify than those formed by direct oxidation. To better understand the propensity for CPDC formation by outer sphere oxidation, the geometrical preferences of CPDCs were examined. The Dezymer algorithm has been used to identify all putative CPDC-forming mutations in 500 proteins. Geometrically, although chemically unrelated, these CPDCs were found to be similar to disulfide-bonded cysteine pairs. Additionally, the percentage of near-sequence pairs (i and i +1 to i and i + 5) increased as the average C(alpha)-C(alpha) distance between the amino acid pairs increased. This survey also examined the protein databank for proteins with pre-attack conformations for CPDCs, using non-bonded contacts reported by Procheck. A total of 323 unique proteins was identified, with 55 being near-sequence amino acid pairs. The high geometric propensity of near-sequence amino acid pairs for forming CPDCs is significant due to difficulties associated with detection by structural or mass spectrometric methods.  相似文献   

6.
A new method has been developed to compute the probability that each amino acid in a protein sequence is in a particular secondary structural element. Each of these probabilities is computed using the entire sequence and a set of predefined structural class models. This set of structural classes is patterned after Jane Richardson''s taxonomy for the domains of globular proteins. For each structural class considered, a mathematical model is constructed to represent constraints on the pattern of secondary structural elements characteristic of that class. These are stochastic models having discrete state spaces (referred to as hidden Markov models by researchers in signal processing and automatic speech recognition). Each model is a mathematical generator of amino acid sequences; the sequence under consideration is modeled as having been generated by one model in the set of candidates. The probability that each model generated the given sequence is computed using a filtering algorithm. The protein is then classified as belonging to the structural class having the most probable model. The secondary structure of the sequence is then analyzed using a "smoothing" algorithm that is optimal for that structural class model. For each residue position in the sequence, the smoother computes the probability that the residue is contained within each of the defined secondary structural elements of the model. This method has two important advantages: (1) the probability of each residue being in each of the modeled secondary structural elements is computed using the totality of the amino acid sequence, and (2) these probabilities are consistent with prior knowledge of realizable domain folds as encoded in each model. As an example of the method''s utility, we present its application to flavodoxin, a prototypical alpha/beta protein having a central beta-sheet, and to thioredoxin, which belongs to a similar structural class but shares no significant sequence similarity.  相似文献   

7.
Regularities in the primary structure of proteins   总被引:3,自引:0,他引:3  
In this paper the latest protein database consisting of more than a million amino acids is analyzed to characterize the short range regularities in the primary structure. The amino acid distributions along the polypeptide chain and among the proteins have been studied first. Their influence on the amino acid pair statistics was taken into account. We are primarily interested in the distances of the covalent structure, where the amino acid pair frequencies show non-random characters. The amino acid pairs separated by at least 20 residues in the covalent structure exhibit an exact Gaussian distribution. We found that there is a range of non-random pairing in the covalent structure. We conclude that the pair preference characters are different for each of the 20 x 20 amino acid pairs. The range of the non-random pairing varies from pair to pair, and in most cases it does not extend beyond the 9th neighbour. The preferences of a certain pair in a certain position can not be derived from the character of that pair in another position. The preference values of 400 amino acid pairs are listed for up to the pairs in 9th neighbour position. Some fields of potential application of these data have also been discussed.  相似文献   

8.
N V Kumar  G Govil 《Biopolymers》1984,23(10):1995-2008
With a view to understanding the role of hydrogen bonds in the recognition of nucleic acids by proteins, hydrogen bonding between the bases and base pairs of nucleic acids and the amino acids (Asn, Gln, Asp and Glu, and charged residues Arg+, Glu?, and Asp?) has been studied by a second-order perturbation theory. Binding energies have been calculated for all possible configurations involving a pair of hydrogen bonds between the base (or base pair) and the amino acid residue. Our results show that the hydrogen bonding in these cases has a large contribution from electrostatic interaction. In general, the charged amino acids, compared to the uncharged ones, form more stable complexes with bases or base pairs. The hydrogen-bond energies are an order of magnitude smaller than the Coulombic interaction energies between basic amino acids (Lys+, Arg+, and His+) and the phosphate groups of nucleic acids. The stabilities of the complexes of amino acids Asn, Gln, Asp, and Glu with bases are in the order: G–X > C–X > A–X U–X or T–X, and G · C–X > A · T(U)–X, where X is one of these amino acid residues. It has been shown that Glu? and Asp? can recognize guanine in single-stranded nucleic acids; Arg+ can recognize G · C base pairs from A · T base pairs in double-stranded structures.  相似文献   

9.
10.
ADP-glucose pyrophosphorylase (AGPase), a key enzyme involved in higher plant starch biosynthesis, is composed of pairs of large (LS) and small subunits (SS). Ample evidence has shown that the AGPase catalyzes the rate limiting step in starch biosynthesis in higher plants. In this study, we compiled detailed comparative information about ADP glucose pyrophosphorylase in selected plants by analyzing their structural features e.g. amino acid content, physico-chemical properties, secondary structural features and phylogenetic classification. Functional analysis of these proteins includes identification of important 10 to 20 amino acids long motifs arise because specific residues and regions proved to be important for the biological function of a group of proteins, which are conserved in both structure and sequence during evolution. Phylogenetic analysis depicts two main clusters. Cluster I encompasses large subunits (LS) while cluster II contains small subunits (SS).  相似文献   

11.
The aim of this research was to examine the possible significance of genome/protein relationships in terms of effects on distribution of mass, especially in proteins. Amino acid residues in proteins have side-chains and polypeptide segments. We use "SCM" (side-chain mass), "MCM" (main-chain mass), and "deltaM" (SCM-MCM) as the deviation from "mass balance." Total MCM of the 61 amino acids in the standard code, 3412, equals total SCM: they form a mass balanced set (mean deltaM = 0). Of 14 natural variants of the code, seven have slightly positive mean deltaM values and seven have slightly negative values. Codes with the standard amino acids assigned randomly to the 20 codon sets of the standard code have about one chance in 3,300 of producing a mass balanced set. In natural proteins, as %A + T increases, the proportion of the mass in the side-chains also increases, by about half the amount calculated for standard genes with various AT/GC ratios, partly due to selection of codons with greater variability in composition at synonymous sites. For 203 representative species (including organelles), the total protein mass is distributed approximately equally between SCM and MCM (overall mean deltaM/amino acid residue, -0.06). The attainment of some overall macromolecular mass balance may have been a criterion for selecting the codon/amino acid pairs. When both structural and dynamic requirements are considered, a genetic code based on hydrophobicity and mass balance as key properties seems likely.  相似文献   

12.
Efforts to predict protein secondary structure have been hampered by the apparent structural plasticity of local amino acid sequences. Kabsch and Sander (1984, Proc. Natl. Acad. Sci. USA 81, 1075–1078) articulated this problem by demonstrating that identical pentapeptide sequences can adopt distinct structures in different proteins. With the increased size of the protein structure database and the availability of new methods to characterize structural environments, we revisit this observation of structural plasticity. Within a set of proteins with less than 50% sequence identity, 59 pairs of identical hexapeptide sequences were identified. These local structures were compared and their surrounding structural environments examined. Within a protein structural class (α/α, β/β, α/β, α + β), the structural similarity of sequentially identical hexapeptides usually is preserved. This study finds eight pairs of identical hexapeptide sequences that adopt β-strand structure in one protein and α-helical structure in the other. In none of the eight cases do the members of these sequence pairs come from proteins within the same folding class. These results have implications for class dependent secondary structure prediction algorithms.  相似文献   

13.
The Rh D blood-group antigen forms part of a complex, involving several other polypeptides, that is deficient in the red cells of individuals who lack all the antigens of the Rh blood-group system (Rhnull red cells). These include components recognized by anti-(Rh D) antibodies and the murine monoclonal antibodies R6A and BRIC 125. We have carried out protein-sequence studies on the components immunoprecipitated by these antibodies. Anti-(Rh D) antibodies immunoprecipitate an Mr-30,000-32,000 polypeptide (the D30 polypeptide) and an Mr-45,000-100,000 glycoprotein (D50 polypeptide). Antibody R6A immunoprecipitates two glycoproteins of Mr 31,000-34,000 (R6A32 polypeptide) and Mr 35,000-52,000 (R6A45 polypeptide). The D30 and R6A32 polypeptides were found to have the same N-terminal amino acid sequences, showing that they are closely related proteins. The D50 polypeptide and the R6A45 polypeptide also had indistinguishable N-terminal amino acid sequences that differed from that of the D30 and R6A32 polypeptides. The putative N-terminal membrane-spanning segments of the two groups of proteins showed homology in their amino acid sequence, which may account for the association of each of the pairs of proteins during co-precipitation by the antibodies. Supplementary data related to the protein sequence have been deposited as Supplementary Publication SUP 50417 (6 pages) at the British Library Document Supply Centre, Boston Spa, Wetherby, West Yorkshire LS23 7BQ, U.K., from whom copies can be obtained on the terms indicated in Biochem. J. (1988) 249, 5.  相似文献   

14.
Two carbohydrate-binding proteins (subunit molecular masses, 32 and 16 kDa, respectively) were isolated for the first time from a nematode, Caenorhabditis elegans. They were specifically extracted with lactose and adsorbed on asialofetuin-Sepharose in the absence of a metal ion. Although these two proteins were co-eluted from a gel filtration column at a position corresponding to an apparent molecular size of 30 kDa under non-denaturing conditions, they could be separated by reversed-phase chromatography. The 32 kDa protein, the main component, was further characterized. Together with its solubility, saccharide specificity and metal independence, some other structural properties, including its amino acid composition, UV spectrum, and partial amino acid sequence, strongly suggested that the 32 kDa protein is a member of a class of soluble beta-galactoside-binding lectins which had previously been only found in vertebrates.  相似文献   

15.
邹凌云  王正志  黄教民 《遗传学报》2007,34(12):1080-1087
蛋白质必须处于正确的亚细胞位置才能行使其功能。文章利用PSI-BLAST工具搜索蛋白质序列,提取位点特异性谱中的位点特异性得分矩阵作为蛋白质的一类特征,并计算4等分序列的氨基酸含量以及1~7阶二肽含量作为另外两类特征,由这三类特征一共得到蛋白质序列的12个特征向量。通过设计一个简单加权函数对各类特征向量加权处理,作为神经网络预测器的输入,并使用Levenberg-Marquardt算法代替传统的EBP算法来调整网络权值和阈值,大大提高了训练速度。对具有4类亚细胞位置和12类亚细胞位置的两种蛋白质数据集分别进行"留一法"测试和5倍交叉验证测试,总体预测精度分别达到88.4%和83.3%。其中,对4类亚细胞位置数据集的预测效果优于普通BP神经网络、隐马尔可夫模型、模糊K邻近等预测方法,对12类亚细胞位置数据集的预测效果优于支持向量机分类方法。最后还对三类特征采取不同加权比例对预测精度的影响进行了讨论,对选择的八种加权比例的预测结果表明,分别给予三类特征合适的权值系数可以进一步提高预测精度。  相似文献   

16.
MOTIVATION: Multiple alignments of proteins are an effective way of identifying conserved amino acids that provide clues to functional relationships among proteins. Quantitation of the abundances of amino acids found at each position in a sequence motif can provide a basis for understanding the structural and functional constraints at each point. Distribution of information across a motif has been used previously, but the non-intuitive nature of the analysis has limited its impact. RESULTS: Here, we introduce a quantitative measure of amino acid sequence diversity (DIVAA) that has a simple, intuitive meaning. Diversity, as a measure of sequence conservation or variation, is inextricably linked to the probability of selecting identical pairs from a distribution. We demonstrate its utility through the analysis of four populations: ATP-binding P-loops, hypervariable domains of kappa light chains, signal sequences, and the N- and C- termini of proteins. DIVAA provides a simple means to generate hypotheses concerning the contribution of individual residues to the functional and evolutionary relationships among proteins. AVAILABILITY: Access to DIVAA software is available at RELIC (http://relic.bio.anl.gov).  相似文献   

17.
蛋白质三维结构叠加面临的主要问题是,参与叠加的目标蛋白质的氨基酸残基存在某些缺失,但是多结构叠加方法却大多数需要完整的氨基酸序列,而目前通用的方法是直接删去缺失的氨基酸序列,导致叠加结果不准确。由于同源蛋白质间结构的相似性,因此,一个蛋白质结构中缺失的某个区域,可能存在于另一个同源蛋白质结构中。基于此,本文提出一种新的、简单、有效的缺失数据下的蛋白质结构叠加方法(ITEMDM)。该方法采用缺失数据的迭代思想计算蛋白质的结构叠加,采用优化的最小二乘算法结合矩阵SVD分解方法,求旋转矩阵和平移向量。用该方法成功叠加了细胞色素C家族的蛋白质和标准Fischer’s 数据库的蛋白质(67对蛋白质),并且与其他方法进行了比较。数值实验表明,本算法有如下优点:①与THESEUS算法相比较,运行时间快,迭代次数少;②与PSSM算法相比较,结果准确,运算时间少。结果表明,该方法可以更好地叠加缺失数据的蛋白质三维结构。  相似文献   

18.
A new method based on neural networks to cluster proteins into families is described. The network is trained with the Kohonen unsupervised learning algorithm, using matrix pattern representations of the protein sequences as inputs. The components (x, y) of these 20×20 matrix patterns are the normalized frequencies of all pairs xy of amino acids in each sequence. We investigate the influence of different learning parameters in the final topological maps obtained with a learning set of ten proteins belonging to three established families. In all cases, except in those where the synaptic vectors remains nearly unchanged during learning, the ten proteins are correctly classified into the expected families. The classification by the trained network of mutated or incomplete sequences of the learned proteins is also analysed. The neural network gives a correct classification for a sequence mutated in 21.5%±7% of its amino acids and for fragments representing 7.5%±3% of the original sequence. Similar results were obtained with a learning set of 32 proteins belonging to 15 families. These results show that a neural network can be trained following the Kohonen algorithm to obtain topological maps of protein sequences, where related proteins are finally associated to the same winner neuron or to neighboring ones, and that the trained network can be applied to rapidly classify new sequences. This approach opens new possibilities to find rapid and efficient algorithms to organize and search for homologies in the whole protein database.  相似文献   

19.
A novel calcium-dependent serine proteinase (CASP) secreted from malignant hamster embryo fibroblast Ni 12C2 degrades extracellular matrix proteins. A complementary DNA encoding CASP has been isolated with the use of oligonucleotide probes synthesized based on partial amino acid sequences of CASP. The complete amino acid sequence of CASP revealed that it has a serine active site at the C-terminal side. Glu rich and proEGF homologous sites are found at the N-terminal site suggesting that it is structurally similar to blood coagulation factors such as IX, X and an anti-coagulation factor, protein C.  相似文献   

20.
Neurogranin, formerly designated p17 (Baudier, J., Bronner, C., Kligman, D., and Cole, R. D.) (1989) J. Biol. Chem. 264, 1824-1828), a brain-specific in vitro substrate for protein kinase C (PKC), has been purified to homogeneity from bovine forebrain. The purified protein has a molecular mass of 7837.1 +/- 0.5 Da, determined by electrospray mass spectrometry. In the absence of reducing agent, dimers and higher oligomers accumulated. On sodium dodecyl sulfate-polyacrylamide gels the protein monomer migrated abnormally with an apparent molecular mass of 15,000-19,000 Da, depending on the percentage of polyacrylamide. The native protein is blocked at its amino terminus. The majority of the primary amino acid sequence was determined following proteolytic and chemical fragmentation. A comparison of the amino acid sequence of neurogranin with that of the brain-specific PKC substrate neuromodulin, revealed a strikingly conserved amino acid sequence AA(X)KIQA-SFRGH(X)(X)RKK(X)K. The two proteins are not related over the rest of their sequences. Neurogranin was shown to be phosphorylated in hippocampal slices incubated with 32Pi and phorbol esters stimulated neurogranin phosphorylation, suggesting that neurogranin is likely to be an in vivo substrate for PKC. In vitro phosphorylation of neurogranin by PKC produced a shift of the isoelectric point of the protein (pI 5.6) to a more acidic value (pI 5.4). Tryptic digestion of the phosphorylated protein yielded a single phosphopeptide having the sequence IQASFR, where the serine residue is the phosphorylated amino acid. This phosphopeptide is part of the conserved sequence shared with neuromodulin and also corresponds to the PKC phosphorylation site on neuromodulin (Apel, E. D., Byford, M. F., Au, D., Walsh, K. A., and Storm, D. R. (1990) Biochemistry 29, 2330-2335). Evidence was obtained suggesting that neurogranin binds to calmodulin in the absence of Ca2+, a feature that also characterizes neuromodulin. We propose that the amino acid sequence shared by neurogranin and neuromodulin reflects a functional relationship between these two proteins and that the consensus sequence represents a conserved PKC phosphorylation site and a calmodulin binding domain that characterizes a class of brain-specific PKC substrates.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号