共查询到20条相似文献,搜索用时 813 毫秒
1.
Sequence alignment is a common method for finding protein structurally conserved/similar regions. However, sequence alignment
is often not accurate if sequence identities between to-be-aligned sequences are less than 30%. This is because that for these
sequences, different residues may play similar structural roles and they are incorrectly aligned during the sequence alignment
using substitution matrix consisting of 20 types of residues. Based on the similarity of physicochemical features, residues
can be clustered into a few groups. Using such simplified alphabets, the complexity of protein sequences is reduced and at
the same time the key information encoded in the sequences remains. As a result, the accuracy of sequence alignment might
be improved if the residues are properly clustered. Here, by using a database of aligned protein structures (DAPS), a new
clustering method based on the substitution scores is proposed for the grouping of residues, and substitution matrices of
residues at different levels of simplification are constructed. The validity of the reduced alphabets is confirmed by relative
entropy analysis. The reduced alphabets are applied to recognition of protein structurally conserved/similar regions by sequence
alignment. The results indicate that the accuracy or efficiency of sequence alignment can be improved with the optimal reduced
alphabet with N around 9.
Supported by the National Natural Science Foundation of China (Grant Nos. 90403120, 10474041 and 10021001) and the Nonlinear
Project (973) of the NSM 相似文献
2.
非比对序列分析是最新发展的一种序列分析方法,具有计算效率高并适用于分析低相似性的序列,已成功用于DNA的序列分析中.但是由于蛋白质序列的复杂性,非比对序列分析对于蛋白质序列分析的准确度却不高.用将20种天然氨基酸残基归类的方法,简化了蛋白质序列的复杂性,并运用到对蛋白质的非比对序列分析中,有效地提高了序列分析的准确性. 相似文献
3.
The ultimate goal of structural genomics is to obtain the structure of each protein coded by each gene within a genome to determine gene function. Because of cost and time limitations, it remains impractical to solve the structure for every gene product experimentally. Up to a point, reasonably accurate three‐dimensional structures can be deduced for proteins with homologous sequences by using comparative modeling. Beyond this, fold recognition or threading methods can be used for proteins showing little homology to any known fold, although this is relatively time‐consuming and limited by the library of template folds currently available. Therefore, it is appropriate to develop methods that can increase our knowledge base, expanding our fold libraries by earmarking potentially “novel” folds for experimental structure determination. How can we sift through proteomic data rapidly and yet reliably identify novel folds as targets for structural genomics? We have analyzed a number of simple methods that discriminate between “novel” and “known” folds. We propose that simple alignments of secondary structure elements using predicted secondary structure could potentially be a more selective method than both a simple fold recognition method (GenTHREADER) and standard sequence alignment at finding novel folds when sequences show no detectable homology to proteins with known structures. Proteins 2002;48:44–52. © 2002 Wiley‐Liss, Inc. 相似文献
4.
Reduced or simplified amino acid alphabets group the 20 naturally occurring amino acids into a smaller number of representative protein residues. To date, several reduced amino acid alphabets have been proposed, which have been derived and optimized by a variety of methods. The resulting reduced amino acid alphabets have been applied to pattern recognition, generation of consensus sequences from multiple alignments, protein folding, and protein structure prediction. In this work, amino acid substitution matrices and statistical potentials were derived based on several reduced amino acid alphabets and their performance assessed in a large benchmark for the tasks of sequence alignment and fold assessment of protein structure models, using as a reference frame the standard alphabet of 20 amino acids. The results showed that a large reduction in the total number of residue types does not necessarily translate into a significant loss of discriminative power for sequence alignment and fold assessment. Therefore, some definitions of a few residue types are able to encode most of the relevant sequence/structure information that is present in the 20 standard amino acids. Based on these results, we suggest that the use of reduced amino acid alphabets may allow to increasing the accuracy of current substitution matrices and statistical potentials for the prediction of protein structure of remote homologs. 相似文献
5.
A workbench for multiple alignment construction and analysis 总被引:126,自引:0,他引:126
Multiple sequence alignment can be a useful technique for studying molecular evolution, as well as for analyzing relationships between structure or function and primary sequence. We have developed for this purpose an interactive program, MACAW (Multiple Alignment Construction and Analysis Workbench), that allows the user to construct multiple alignments by locating, analyzing, editing, and combining "blocks" of aligned sequence segments. MACAW incorporates several novel features. (1) Regions of local similarity are located by a new search algorithm that avoids many of the limitations of previous techniques. (2) The statistical significance of blocks of similarity is evaluated using a recently developed mathematical theory. (3) Candidate blocks may be evaluated for potential inclusion in a multiple alignment using a variety of visualization tools. (4) A user interface permits each block to be edited by moving its boundaries or by eliminating particular segments, and blocks may be linked to form a composite multiple alignment. No completely automatic program is likely to deal effectively with all the complexities of the multiple alignment problem; by combining a powerful similarity search algorithm with flexible editing, analysis and display tools, MACAW allows the alignment strategy to be tailored to the problem at hand. 相似文献
6.
Ravishankar Ramanathan Abhinav Verma 《Journal of biomolecular structure & dynamics》2013,31(4):661-662
Summary A conformational search by simulated annealing has been performed on two peptides derivated from the tetradecapeptide used to isolate the Xenopus laevis skin maturation RXVRG-endoprotease. The Ala 12 derivative, obtained by substitution in the hydrophobic C terminal fragment and the undecapeptide 4–14, obtained by deletion of an acidic rich tripeptide, were studied. No unique structure has been found for the tetradecapeptide Ala 12. This structural disorganization could explain the loss of activity of the endoprotease towards the subsituted peptide. For the undecapeptide, two different models in accordance with the NMR data were found. The conformational differences between these two models are locat ed in the consensus sequence and in each case an hairpin-like conformation is observed. These results could be related to the enhanced cleavage activity of the maturation enzyme. The obtained structures are also compared with those of the original tetradecapeptide. 相似文献
7.
In the era of structural genomics, it is necessary to generate accurate structural alignments in order to build good templates for homology modeling. Although a great number of structural alignment algorithms have been developed, most of them ignore intermolecular interactions during the alignment procedure. Therefore, structures in different oligomeric states are barely distinguishable, and it is very challenging to find correct alignment in coil regions. Here we present a novel approach to structural alignment using a clique finding algorithm and environmental information (SAUCE). In this approach, we build the alignment based on not only structural coordinate information but also realistic environmental information extracted from biological unit files provided by the Protein Data Bank (PDB). At first, we eliminate all environmentally unfavorable pairings of residues. Then we identify alignments in core regions via a maximal clique finding algorithm. Two extreme value distribution (EVD) form statistics have been developed to evaluate core region alignments. With an optional extension step, global alignment can be derived based on environment-based dynamic programming linking. We show that our method is able to differentiate three-dimensional structures in different oligomeric states, and is able to find flexible alignments between multidomain structures without predetermined hinge regions. The overall performance is also evaluated on a large scale by comparisons to current structural classification databases as well as to other alignment methods. 相似文献
8.
Highly repetitive sequence within proteins is an abundant feature yet is considered by some to be the protein equivalent of \"junk DNA.\" Homopolymer sequences, the most highly repetitive of this group, are typically encoded by trinucleotide repeats at the DNA level. It is thought that many of these sequences are produced by a replicative slippage mechanism. Recent studies suggest that these highly mutable regions within proteins may allow for rapid morphological evolution emerging from the increased variability afforded by such coding structures. However, in a homopolymer, it is difficult to determine if the repeated amino acid is due to slippage at the DNA level or due to selection at the protein level. Here we develop and test a model to detect cases for which the homopolymer tract has clearly been selected for, with no evidence of slippage at the DNA level. The polyserine tract within the phosphatidylserine receptor protein is used as an excellent example of one such case. 相似文献
9.
Stephen F. Altschul 《Proteins》1998,32(1):88-96
Based on the observation that a single mutational event can delete or insert multiple residues, affine gap costs for sequence alignment charge a penalty for the existence of a gap, and a further length-dependent penalty. From structural or multiple alignments of distantly related proteins, it has been observed that conserved residues frequently fall into ungapped blocks separated by relatively nonconserved regions. To take advantage of this structure, a simple generalization of affine gap costs is proposed that allows nonconserved regions to be effectively ignored. The distribution of scores from local alignments using these generalized gap costs is shown empirically to follow an extreme value distribution. Examples are presented for which generalized affine gap costs yield superior alignments from the standpoints both of statistical significance and of alignment accuracy. Guidelines for selecting generalized affine gap costs are discussed, as is their possible application to multiple alignment. Proteins 32:88–96, 1998. Published 1998 Wiley-Liss, Inc. 1 This article is a US government work and, as such, is in the public domain in the United States of America. 相似文献
10.
Knowing protein structure and inferring its function from the structure are one of the main issues of computational structural biology, and often the first step is studying protein secondary structure. There have been many attempts to predict protein secondary structure contents. Previous attempts assumed that the content of protein secondary structure can be predicted successfully using the information on the amino acid composition of a protein. Recent methods achieved remarkable prediction accuracy by using the expanded composition information. The overall average error of the most successful method is 3.4%. Here, we demonstrate that even if we only use the simple amino acid composition information alone, it is possible to improve the prediction accuracy significantly if the evolutionary information is included. The idea is motivated by the observation that evolutionarily related proteins share the similar structure. After calculating the homolog-averaged amino acid composition of a protein, which can be easily obtained from the multiple sequence alignment by running PSI-BLAST, those 20 numbers are learned by a multiple linear regression, an artificial neural network and a support vector regression. The overall average error of method by a support vector regression is 3.3%. It is remarkable that we obtain the comparable accuracy without utilizing the expanded composition information such as pair-coupled amino acid composition. This work again demonstrates that the amino acid composition is a fundamental characteristic of a protein. It is anticipated that our novel idea can be applied to many areas of protein bioinformatics where the amino acid composition information is utilized, such as subcellular localization prediction, enzyme subclass prediction, domain boundary prediction, signal sequence prediction, and prediction of unfolded segment in a protein sequence, to name a few. 相似文献
11.
12.
Valdar WS 《Proteins》2002,48(2):227-241
The importance of a residue for maintaining the structure and function of a protein can usually be inferred from how conserved it appears in a multiple sequence alignment of that protein and its homologues. A reliable metric for quantifying residue conservation is desirable. Over the last two decades many such scores have been proposed, but none has emerged as a generally accepted standard. This work surveys the range of scores that biologists, biochemists, and, more recently, bioinformatics workers have developed, and reviews the intrinsic problems associated with developing and evaluating such a score. A general formula is proposed that may be used to compare the properties of different particular conservation scores or as a measure of conservation in its own right. 相似文献
13.
Structure‐based classification of FAD binding sites: A comparative study of structural alignment tools
下载免费PDF全文

A total of six different structural alignment tools (TM‐Align, TriangleMatch, CLICK, ProBis, SiteEngine and GA‐SI) were assessed for their ability to perform two particular tasks: (i) discriminating FAD (flavin adenine dinucleotide) from non‐FAD binding sites, and (ii) performing an all‐to‐all comparison on a set of 883 FAD binding sites for the purpose of classifying them. For the first task, the consistency of each alignment method was evaluated, showing that every method is able to distinguish FAD and non‐FAD binding sites with a high Matthews correlation coefficient. Additionally, GA‐SI was found to provide alignments different from those of the other approaches. The results obtained for the second task revealed more significant differences among alignment methods, as reflected in the poor correlation of their results and highlighted clearly by the independent evaluation of the structural superimpositions generated by each method. The classification itself was performed using the combined results of all methods, using the best result found for each comparison of binding sites. A number of different clustering methods (Single‐linkage, UPGMA, Complete‐linkage, SPICKER and k‐Means clustering) were also used. The groups of similar binding sites (proteins) or clusters generated by the best performing method were further analyzed in terms of local sequence identity, local structural similarity and conservation of analogous contacts with the FAD ligands. Each of the clusters was characterized by a unique set of structural features or patterns, demonstrating that the groups generated truly reflect the structural diversity of FAD binding sites. Proteins 2016; 84:1728–1747. © 2016 Wiley Periodicals, Inc. 相似文献
14.
Paired natural cysteine mutation mapping: aid to constraining models of protein tertiary structure.
下载免费PDF全文

R. Kreisberg V. Buchner D. Arad 《Protein science : a publication of the Protein Society》1995,4(11):2405-2410
This paper discusses the benefit of mapping paired cysteine mutation patterns as a guide to identifying the positions of protein disulfide bonds. This information can facilitate the computer modeling of protein tertiary structure. First, a simple, paired natural-cysteine-mutation map is presented that identifies the positions of putative disulfide bonds in protein families. The method is based on the observation that if, during the process of evolution, a disulfide-bonded cysteine residue is not conserved, then it is likely that its counterpart will also be mutated. For each target protein, protein databases were searched for the primary amino acid sequences of all known members of distinct protein families. Primary sequence alignment was carried out using PileUp algorithms in the GCG package. To search for correlated mutations, we listed only the positions where cysteine residues were highly conserved and emphasized the mutated residues. In proteins of known three-dimensional structure, a striking pattern of paired cysteine mutations correlated with the positions of known disulfide bridges. For proteins of unknown architecture, the mutation maps showed several positions where disulfide bridging might occur. 相似文献
15.
BCL::Align is a multiple sequence alignment tool that utilizes the dynamic programming method in combination with a customizable scoring function for sequence alignment and fold recognition. The scoring function is a weighted sum of the traditional PAM and BLOSUM scoring matrices, position-specific scoring matrices output by PSI-BLAST, secondary structure predicted by a variety of methods, chemical properties, and gap penalties. By adjusting the weights, the method can be tailored for fold recognition or sequence alignment tasks at different levels of sequence identity. A Monte Carlo algorithm was used to determine optimized weight sets for sequence alignment and fold recognition that most accurately reproduced the SABmark reference alignment test set. In an evaluation of sequence alignment performance, BCL::Align ranked best in alignment accuracy (Cline score of 22.90 for sequences in the Twilight Zone) when compared with Align-m, ClustalW, T-Coffee, and MUSCLE. ROC curve analysis indicates BCL::Align's ability to correctly recognize protein folds with over 80% accuracy. The flexibility of the program allows it to be optimized for specific classes of proteins (e.g. membrane proteins) or fold families (e.g. TIM-barrel proteins). BCL::Align is free for academic use and available online at http://www.meilerlab.org/. 相似文献
16.
氨基酸是构成蛋白质的基本单元,被广泛应用于食品、医药、饲料和化工等多个领域。利用微生物细胞工厂生产氨基酸,具备原料可再生、生产过程条件温和、产品纯度高、环境污染小等优点,能够助力实现碳中和。借助代谢工程和合成生物学技术,对大肠杆菌(Escherichia coli)和谷氨酸棒杆菌(Corynebacterium glutamicum)进行定向设计、改造与优化,创制了高产氨基酸的微生物细胞工厂,实现了支链氨基酸、天冬氨酸族氨基酸、谷氨酸族氨基酸和芳香族氨基酸的生物炼制。本文对高产氨基酸的大肠杆菌细胞工厂和谷氨酸棒杆菌细胞工厂创制过程进行分析,以期对高性能微生物细胞工厂的创制提供参考。 相似文献
17.
18.
The role of pattern databases in sequence analysis 总被引:2,自引:0,他引:2
Attwood TK 《Briefings in bioinformatics》2000,1(1):45-59
In the wake of the numerous now-fruitful genome projects, we are entering an era rich in biological data. The field of bioinformatics is poised to exploit this information in increasingly powerful ways, but the abundance and growing complexity both of the data and of the tools and resources required to analyse them are threatening to overwhelm us. Databases and their search tools are now an essential part of the research environment. However, the rate of sequence generation and the haphazard proliferation of databases have made it difficult to keep pace with developments. In an age of information overload, researchers want rapid, easy-to-use, reliable tools for functional characterisation of newly determined sequences. But what are those tools? How do we access them? Which should we use? This review focuses on a particular type of database that is increasingly used in the task of routine sequence analysis--the so-called pattern database. The paper aims to provide an overview of the current status of pattern databases in common use, outlining the methods behind them and giving pointers on their diagnostic strengths and weaknesses. 相似文献
19.
A comparison of fluorescamine and o-phthaldialdehyde as effective blocking reagents in protein sequence analyses by the Beckman sequencer 总被引:1,自引:0,他引:1
Use of o-phthaldialdehyde to chemically reduce the newly generated amino termini responsible for the progressively increasing background during an extended amino acid sequence analysis in a liquid phase sequencer has been described. The results have been compared with Fluram blocking using apomyoglobin and rabbit C-reactive protein as standard and unknown samples, respectively. 相似文献
20.
Carugo O 《Protein science : a publication of the Protein Society》2008,17(12):2187-2191
There is indirect evidence that the amino acid composition of proteins depends on their dimension. The amino acid composition of a nonredundant set of about 550,000 proteins was determined and it was observed that, in the range of 50-200 residues, the percentage of occurrence of most of the residue types significantly depends on protein dimension. This result should prove useful in analyzing protein sequences and genomics. 相似文献