首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Current methods for identification of domains within protein sequences require either structural information or the identification of homologous domain sequences in different sequence contexts. Knowledge of structural domain boundaries is important for fold recognition experiments and structural determination by X-ray crystallography or nuclear magnetic resonance spectroscopy using the divide-and-conquer approach. Here, a new and conceptually simple method for the identification of structural domain boundaries in multiple protein sequence alignments is presented. Analysis of covariance at positions within the alignment is first used to predict 3D contacts. By the nature of the domain as an independent folding unit, inter-domain predicted contacts are fewer than intra-domain predicted contacts. By analysing all possible domain boundaries and constructing a smoothed profile of predicted contact density (PCD), true structural domain boundaries are predicted as local profile minima associated with low PCD. A training data set is constructed from 52 non-homologous two-domain protein sequences of known 3D structure and used to determine optimal parameters for the profile analysis. The alignments in the training data set contained 48 +/- 17 (mean +/- SD) sequences and lengths of 257 +/- 121 residues. Of the 47 alignments yielding predictions, 35% of true domain boundaries are predicted to within 15 amino acids by the local profile minimum with the lowest profile value. Including predictions from the second- and third-lowest local minima increases the correct domain boundary coverage to 60%, whereas the lowest five local minima cover 79% of correct domain boundaries. Through further profile analysis, criteria are presented which reliably identify subsets of more accurate predictions. Retrospective analysis of CASP3 targets shows predictions of sufficient accuracy to enable dramatically improved fold recognition results. Finally, a prediction is made for geminivirus AL1 protein which is in full agreement with biochemical data, yielding a plausible, novel threading result.  相似文献   

2.
3.
4.
Machine learning techniques have improved predictions of secretory proteins from protein, genomic and expressed sequence tag (EST) sequences. Artificial neural networks, physical sequence analysis using high-performance optimization, and hidden Markov models identify extremely variable signal peptides (the vehicles of protein transport across the endoplasmic reticulum membrane), transmembrane segments, and specific extracellular and intracellular domains as indicators of possible roles in the intercellular and intracellular chemical signaling pathways. The major role of peptide hormones, blood coagulation factors, carcinogenesis agents, and other secretory proteins in orchestrating multicellular life indicates pharmacological potential in the cure of major diseases and numerous biotechnological applications.  相似文献   

5.
Tilted peptides are short hydrophobic protein fragments characterized by an asymmetric distribution of their hydrophobic residues when helical. They are able to interact with a hydrophobic/hydrophilic interface (such as a lipid membrane) and to destabilize the organized system into which they insert. They were detected in viral fusion proteins and in proteins involved in different biological processes involving membrane insertion or translocation of the protein in which they are found. In this paper, we have analysed different protein domains related to membrane insertion with regard to their tilted properties. They are the N-terminal signal peptide of the filamentous haemagglutinin (FHA), a Bordetella pertussis protein secreted in high amount and the hydrophobic domain from proteins forming pores (i.e. ColIa, Bax and Bcl-2). From the predictions and the experimental approaches, we suggest that tilted peptides found in those proteins could have a more general role in the mechanism of insertion/translocation of proteins into/across membranes. For the signal sequences, they could help the protein machinery involved in protein secretion to be more active. In the case of toroidal pore formation, they could disturb the lipids, facilitating the insertion of the other more hydrophilic helices.  相似文献   

6.
Multiple alignment of protein sequences with repeats and rearrangements   总被引:3,自引:0,他引:3  
Multiple sequence alignments are the usual starting point for analyses of protein structure and evolution. For proteins with repeated, shuffled and missing domains, however, traditional multiple sequence alignment algorithms fail to provide an accurate view of homology between related proteins, because they either assume that the input sequences are globally alignable or require locally alignable regions to appear in the same order in all sequences. In this paper, we present ProDA, a novel system for automated detection and alignment of homologous regions in collections of proteins with arbitrary domain architectures. Given an input set of unaligned sequences, ProDA identifies all homologous regions appearing in one or more sequences, and returns a collection of local multiple alignments for these regions. On a subset of the BAliBASE benchmarking suite containing curated alignments of proteins with complicated domain architectures, ProDA performs well in detecting conserved domain boundaries and clustering domain segments, achieving the highest accuracy to date for this task. We conclude that ProDA is a practical tool for automated alignment of protein sequences with repeats and rearrangements in their domain architecture.  相似文献   

7.
Trifonov EN 《Biofizika》2002,47(4):581-586
One would expect that present-day protein sequences have changed many times during their evolution, at every point, so that there is no chance to recognize in the sequences any traces of their ancient organization. It turns out to be not true. Massive analysis of complete genomes of bacteria allows one to derive, according to very specific predictions, distinct features of very early sequences and to outline the history of evolution protein. Modern proteins appear to have evolved from short peptides of mixed sequences of two alphabet types. They were then closed to sequences of optimal size from which modern folds/domains and multidomain proteins were formed. The reconstruction of amino acid and codon chronology is described. A specific idea on the nature and evolutionary significance of gene splicing is suggested. The gene splicing, while obeying the rules of basic structural organization of proteins, offers accessibility to regions of sequence space that could not be reached by mutational changes typical for prokaryotes.  相似文献   

8.
MOTIVATION: Consensus sequence generation is important in many kinds of sequence analysis ranging from sequence assembly to profile-based iterative search methods. However, how can a consensus be constructed when its inherent assumption-that the aligned sequences form a single linear consensus-is not true? RESULTS: Partial Order Alignment (POA) enables construction and analysis of multiple sequence alignments as directed acyclic graphs containing complex branching structure. Here we present a dynamic programming algorithm (heaviest_bundle) for generating multiple consensus sequences from such complex alignments. The number and relationships of these consensus sequences reveals the degree of structural complexity of the source alignment. This is a powerful and general approach for analyzing and visualizing complex alignment structures, and can be applied to any alignment. We illustrate its value for analyzing expressed sequence alignments to detect alternative splicing, reconstruct full length mRNA isoform sequences from EST fragments, and separate paralog mixtures that can cause incorrect SNP predictions. AVAILABILITY: The heaviest_bundle source code is available at http://www.bioinformatics.ucla.edu/poa  相似文献   

9.
10.
Signal peptides: exquisitely designed transport promoters   总被引:37,自引:2,他引:35  
Prokaryotic proteins destined for transport out of the cytoplasm typically contain an N-terminal extension sequence, called the signal peptide, which is required for export, it is evident that many secretory proteins utilize a common export system, yet the signal sequences themselves display very little primary sequence homology. in attempting to understand how different signal peptides are able to promote protein secretion through the same pathway, the physical features of natural signal sequences have been extensively examined for similarities that might play a part in function. Experimental data have confirmed statistical analyses which highlighted dominant features of natural signal sequences in Escherichia coli: a net positive charge in the N-terminus increases efficiency of transport; the core region must maintain a threshold level of hydrophoblcity within a range of length limitations; the central portion adopts an α-hellcal conformation in hydrophobic environments; and the signal cleavage region is ideally six residues long, with small side-chain amino acids in the −1 and −3 positions. This review focuses on the parallels between signal peptide physical features and their functions, which emerge when the results of a variety of experimental approaches are combined. The requirement for each property may be ascribed to a potential interaction that is critical for efficient protein export. The summation of the key physical features produces signal peptides with the flexibility to function in multiple roles in order to expedite secretion. In this way, nature has indeed evolved exquisitely tuned signal sequences.  相似文献   

11.
Spider silk has been extensively studied for its outstanding mechanical properties. Partial intermediate and C-terminal sequences of different spider silk proteins have been determined, and during the past decade also N-terminal domains have been characterized. However, only some of these N-terminal domains have been reported to contain signal peptides, leaving the mechanism whereby they enter the secretory pathway open to speculation. Here we present the sequence of a 394-residue N-terminal region of the Euprosthenops australis major ampullate spidroin 1 (MaSp1). A close comparison with published sequences from other species revealed the presence of N-terminal signal peptides followed by an approximately 130-residue nonrepetitive domain. From secondary structure predictions, helical wheel analysis, and circular dichroism spectroscopy this domain is concluded to contain five alpha-helices and is a conserved constituent of hitherto analyzed dragline, flagelliform, and cylindriform spider silk proteins.  相似文献   

12.
Genome sequencing projects have ciphered millions of protein sequence, which require knowledge of their structure and function to improve the understanding of their biological role. Although experimental methods can provide detailed information for a small fraction of these proteins, computational modeling is needed for the majority of protein molecules which are experimentally uncharacterized. The I-TASSER server is an on-line workbench for high-resolution modeling of protein structure and function. Given a protein sequence, a typical output from the I-TASSER server includes secondary structure prediction, predicted solvent accessibility of each residue, homologous template proteins detected by threading and structure alignments, up to five full-length tertiary structural models, and structure-based functional annotations for enzyme classification, Gene Ontology terms and protein-ligand binding sites. All the predictions are tagged with a confidence score which tells how accurate the predictions are without knowing the experimental data. To facilitate the special requests of end users, the server provides channels to accept user-specified inter-residue distance and contact maps to interactively change the I-TASSER modeling; it also allows users to specify any proteins as template, or to exclude any template proteins during the structure assembly simulations. The structural information could be collected by the users based on experimental evidences or biological insights with the purpose of improving the quality of I-TASSER predictions. The server was evaluated as the best programs for protein structure and function predictions in the recent community-wide CASP experiments. There are currently >20,000 registered scientists from over 100 countries who are using the on-line I-TASSER server.  相似文献   

13.
The structure of an endocytosis signal   总被引:6,自引:0,他引:6  
The efficient endocytosis of transmembrane receptor proteins requires a signal sequence in the cytoplasmic domain of the protein to promote clustering into coated pits. Analysis of the clustering of receptors with natural or engineered mutations in their cytoplasmic domains implicates an aromatic residue in a particular context as the necessary clustering signal. Recent detailed studies of mutants have led to computer predictions of a plausible structural motif. These predictions have now been elegantly supported by using NMR to determine the structure of synthetic peptides. New evidence that this sorting signal performs multiple functions suggests that this may not be the whole story.  相似文献   

14.
15.
When aligning biological sequences, the choice of parameter values for the alignment scoring function is critical. Small changes in gap penalties, for example, can yield radically different alignments. A rigorous way to compute parameter values that are appropriate for aligning biological sequences is through inverse parametric sequence alignment. Given a collection of examples of biologically correct alignments, this is the problem of finding parameter values that make the scores of the example alignments close to those of optimal alignments for their sequences. We extend prior work on inverse parametric alignment to partial examples, which contain regions where the alignment is left unspecified, and to an improved formulation based on minimizing the average error between the score of an example and the score of an optimal alignment. Experiments on benchmark biological alignments show we can find parameters that generalize across protein families and that boost the accuracy of multiple sequence alignment by as much as 25%.  相似文献   

16.
17.
应用生物信息学的方法和工具对番茄LeNHX1蛋白质的理化性质、跨膜区域、疏水性/亲水性、二级结构、结构功能域、功能分类和同源性进行分析.结果表明此蛋白为疏水性稳定蛋白,包含一个保守的氨氯吡嗪咪结合位点LFFIYVLPPI区域,相对分子量为59.0 kD,等电点为6.60,存在10个跨膜区域,蛋白质二级结构中的主要构成元件是α-螺旋和不规则卷曲,功能分类和蛋白质同源性分析表明番茄LeNHX1属于液泡膜Na+/H+反向转运蛋白.  相似文献   

18.
MOTIVATION: The discovery of solid-binding peptide sequences is accelerating along with their practical applications in biotechnology and materials sciences. A better understanding of the relationships between the peptide sequences and their binding affinities or specificities will enable further design of novel peptides with selected properties of interest both in engineering and medicine. RESULTS: A bioinformatics approach was developed to classify peptides selected by in vivo techniques according to their inorganic solid-binding properties. Our approach performs all-against-all comparisons of experimentally selected peptides with short amino acid sequences that were categorized for their binding affinity and scores the alignments using sequence similarity scoring matrices. We generated novel scoring matrices that optimize the similarities within the strong-binding peptide sequences and the differences between the strong- and weak-binding peptide sequences. Using the scoring matrices thus generated, a given peptide is classified based on the sequence similarity to a set of experimentally selected peptides. We demonstrate the new approach by classifying experimentally characterized quartz-binding peptides and computationally designing new sequences with specific affinities. Experimental verifications of binding of these computationally designed peptides confirm our predictions with high accuracy. We further show that our approach is a general one and can be used to design new sequences that bind to a given inorganic solid with predictable and enhanced affinity.  相似文献   

19.
Several fold recognition algorithms are compared to each other in terms of prediction accuracy and significance. It is shown that on standard benchmarks, hybrid methods, which combine scoring based on sequence-sequence and sequence-structure matching, surpass both sequence and threading methods in the number of accurate predictions. However, the sequence similarity contributes most to the prediction accuracy. This strongly argues that most examples of apparently nonhomologous proteins with similar folds are actually related by evolution. While disappointing from the perspective of the fundamental understanding of protein folding, this adds a new significance to fold recognition methods as a possible first step in function prediction. Despite hybrid methods being more accurate at fold prediction than either the sequence or threading methods, each of the methods is correct in some cases where others have failed. This partly reflects a different perspective on sequence/structure relationship embedded in various methods. To combine predictions from different methods, estimates of significance of predictions are made for all methods. With the help of such estimates, it is possible to develop a "jury" method, which has accuracy higher than any of the single methods. Finally, building full three-dimensional models for all top predictions helps to eliminate possible false positives where alignments, which are optimal in the one-dimensional sequences, lead to unsolvable sterical conflicts for the full three-dimensional models.  相似文献   

20.
Chlorarachniophytes are marine amoeboflagellate protists that have acquired their plastid (chloroplast) through secondary endosymbiosis with a green alga. Like other algae, most of the proteins necessary for plastid function are encoded in the nuclear genome of the secondary host. These proteins are targeted to the organelle using a bipartite leader sequence consisting of a signal peptide (allowing entry in to the endomembrane system) and a chloroplast transit peptide (for transport across the chloroplast envelope membranes). We have examined the leader sequences from 45 full-length predicted plastid-targeted proteins from the chlorarachniophyte Bigelowiella natans with the goal of understanding important features of these sequences and possible conserved motifs. The chemical characteristics of these sequences were compared with a set of 10 B. natans endomembrane-targeted proteins and 38 cytosolic or nuclear proteins, which show that the signal peptides are similar to those of most other eukaryotes, while the transit peptides differ from those of other algae in some characteristics. Consistent with this, the leader sequence from one B. natans protein was tested for function in the apicomplexan parasite, Toxoplasma gondii, and shown to direct the secretion of the protein.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号