首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Carugo O 《Bioinformation》2010,4(8):347-351
Several non-redundant ensembles of protein three-dimensional structures were analyzed in order to estimate their natural clustering tendency by means of the Cox-Lewis coefficient. It was observed that, despite proteins tend to aggregate into different and well separated groups, some overlap between different clusters occurs. This suggests that classifications bases only on structural data cannot allow a systematic classification of proteins. Additional information are in particular needed in order to monitor completely the complex evolutionary relationships between proteins.  相似文献   

2.
Accurate detection of protein families allows assignment of protein function and the analysis of functional diversity in complete genomes. Recently, we presented a novel algorithm called TribeMCL for the detection of protein families that is both accurate and efficient. This method allows family analysis to be carried out on a very large scale. Using TribeMCL, we have generated a resource called TRIBES that contains protein family information, comprising annotations, protein sequence alignments and phylogenetic distributions describing 311 257 proteins from 83 completely sequenced genomes. The analysis of at least 60 934 detected protein families reveals that, with the essential families excluded, paralogy levels are similar between prokaryotes, irrespective of genome size. The number of essential families is estimated to be between 366 and 426. We also show that the currently known space of protein families is scale free and discuss the implications of this distribution. In addition, we show that smaller families are often formed by shorter proteins and discuss the reasons for this intriguing pattern. Finally, we analyse the functional diversity of protein families in entire genome sequences. The TRIBES protein family resource is accessible at http://www.ebi.ac.uk/research/cgg/tribes/.  相似文献   

3.
Following the original idea of Maynard Smith on evolution of the protein sequence space, a novel tool is developed that allows the "space walk", from one sequence to its likely evolutionary relative and further on. At a given threshold of identity between consecutive steps, the walks of many steps are possible. The sequences at the ends of the walks may substantially differ from one another. In a sequence space of randomized (shuffled) sequences the walks are very short. The approach opens new perspectives for protein evolutionary studies and sequence annotation.  相似文献   

4.
Pellegrini M  Yeates TO 《Proteins》1999,37(2):278-283
The protein sequence database was analyzed for evidence that some distinct sequence families might be distantly related in evolution by changes in frame of translation. Sequences were compared using special amino acid substitution matrices for the alternate frames of translation. The statistical significance of alignment scores were computed in the true database and shuffled versions of the database that preserve any potential codon bias. The comparison of results from these two databases provides a very sensitive method for detecting remote relationships. We find a weak but measurable relatedness within the database as a whole, supporting the notion that some proteins may have evolved from others through changes in frame of translation. We also quantify residual homology in the ordinary sense within a database of generally unrelated sequences.  相似文献   

5.
From protein sequence space to elementary protein modules   总被引:2,自引:0,他引:2  
Frenkel ZM  Trifonov EN 《Gene》2008,408(1-2):64-71
The formatted protein sequence space is built from identical size fragments of prokaryotic proteins (112 complete proteomes). Connecting sequence-wise similar fragments (points in the space) results in the formation of numerous networks, that combine sometimes different types of proteins sharing, though, fragments with similar or distantly related sequences. The networks are mapped on individual protein sequences revealing distinct regions (modules) associated with prominent networks with well-defined functional identities. Presence of multiple sites of sequence conservation (modules) in a given protein sequence suggests that the annotated protein function may be decomposed in "elementary" subfunctions of the respective modules. The modules correspond to previously discovered conserved closed loop structures and their sequence prototypes.  相似文献   

6.
The PRINTS database: a resource for identification of protein families   总被引:4,自引:0,他引:4  
The PRINTS database houses a collection of protein fingerprints, which may be used to assign family and functional attributes to uncharacterised sequences, such as those currently emanating from the various genome-sequencing projects. The April 2002 release includes 1,700 family fingerprints, encoding approximately 10,500 motifs, covering a range of globular and membrane proteins, modular polypeptides and so on. Fingerprints are groups of conserved motifs that, taken together, provide diagnostic protein family signatures. They derive much of their potency from the biological context afforded by matching motif neighbours; this makes them at once more flexible and powerful than single-motif approaches. The technique further departs from other pattern-matching methods by readily allowing the creation of fingerprints at superfamily-, family- and subfamily-specific levels, thereby allowing more fine-grained diagnoses. Here, we provide an overview of the method of protein fingerprinting and how the results of fingerprint analyses are used to build PRINTS and its relational cousin, PRINTS-S.  相似文献   

7.

Background

It is a major challenge of computational biology to provide a comprehensive functional classification of all known proteins. Most existing methods seek recurrent patterns in known proteins based on manually-validated alignments of known protein families. Such methods can achieve high sensitivity, but are limited by the necessary manual labor. This makes our current view of the protein world incomplete and biased. This paper concerns ProtoNet, a automatic unsupervised global clustering system that generates a hierarchical tree of over 1,000,000 proteins, based solely on sequence similarity.

Results

In this paper we show that ProtoNet correctly captures functional and structural aspects of the protein world. Furthermore, a novel feature is an automatic procedure that reduces the tree to 12% its original size. This procedure utilizes only parameters intrinsic to the clustering process. Despite the substantial reduction in size, the system's predictive power concerning biological functions is hardly affected. We then carry out an automatic comparison with existing functional protein annotations. Consequently, 78% of the clusters in the compressed tree (5,300 clusters) get assigned a biological function with a high confidence. The clustering and compression processes are unsupervised, and robust.

Conclusions

We present an automatically generated unbiased method that provides a hierarchical classification of all currently known proteins.
  相似文献   

8.
Sequence-specific DNA-binding proteins must quickly and reliably localize specific target sites on DNA. This search process has been well characterized for monomeric proteins, but it remains poorly understood for systems that require assembly into dimers or oligomers at the target site. We present a single-molecule study of the target-search mechanism of protelomerase TelK, a recombinase-like protein that is only active as a dimer. We show that TelK undergoes 1D diffusion on non-target DNA as a monomer, and it immobilizes upon dimerization even in the absence of a DNA target site. We further show that dimeric TelK condenses non-target DNA, forming a tightly bound nucleoprotein complex. Together with theoretical calculations and molecular dynamics simulations, we present a novel target-search model for TelK, which may be generalizable to other dimer and oligomer-active proteins.  相似文献   

9.
Knowledge-based potentials can be used to decide whether an amino acid sequence is likely to fold into a prescribed native protein structure. We use this idea to survey the sequence-structure relations in protein space. In particular, we test the following two propositions which were found to be important for efficient evolution: the sequences folding into a particular native fold form extensive neutral networks that percolate through sequence space. The neutral networks of any two native folds approach each other to within a few point mutations. Computer simulations using two very different potential functions, M. Sippl's PROSA pair potential and a neural network based potential, are used to verify these claims.  相似文献   

10.
Evolutionary networks in the formatted protein sequence space.   总被引:4,自引:0,他引:4  
In our recent work, a new approach to establish sequence relatedness, by walking through the protein sequence space, was introduced. The sequence space is built from 20 amino acid long fragments of proteins from a very large collection of fully sequenced prokaryotic genomes. The fragments, points in the space, are connected, if they are closely related (high sequence identity). The connected fragments form variety of networks of sequence kinship. In this research the networks in the formatted sequence space and their topology are analyzed. For lower identity thresholds a huge network of complex structure is formed, involving up to 10% points of the space. When the threshold is increased, the major network splits into a set of smaller clusters with a wide diversity of sizes and topologies. Such "evolutionary networks" may serve as a powerful sequence annotation tool that allows one to reveal fine details in the evolutionary history of proteins.  相似文献   

11.
Directed protein evolution is the most versatile method for studying protein structure-function relationships, and for tailoring a protein's properties to the needs of industrial applications. In this review, we performed a statistical analysis on the genetic code to study the extent and consequence of the organization of the genetic code on amino acid substitution patterns generated in directed evolution experiments. In detail, we analyzed amino acid substitution patterns caused by (a) a single nucleotide (nt) exchange at each position of all 64 codons, and (b) two subsequent nt exchanges (first and second nt, first and third nt, second and third nt). Additionally, transitions and transversions mutations were compared at the level of amino acid substitution patterns. The latter analysis showed that single nucleotide substitution in a codon generates only 39.5% of the natural diversity on the protein level with 5.2-7 amino acid substitutions per codon. Transversions generate more complex amino acid substitution patterns (increased number and chemically more diverse amino acid substitutions) than transitions. Simultaneous nt exchanges at both first and second nt of a codon generates very diverse amino acid substitution patterns, achieving 83.2% of the natural diversity. The statistical analysis described in this review sets the objectives for novel random mutagenesis methods that address the consequences of the organization of the genetic code. Random mutagenesis methods that favor transversions or introduce consecutive nt exchanges can contribute in this regard.  相似文献   

12.
Directed protein evolution is the most versatile method for studying protein structure–function relationships, and for tailoring a protein's properties to the needs of industrial applications. In this review, we performed a statistical analysis on the genetic code to study the extent and consequence of the organization of the genetic code on amino acid substitution patterns generated in directed evolution experiments. In detail, we analyzed amino acid substitution patterns caused by (a) a single nucleotide (nt) exchange at each position of all 64 codons, and (b) two subsequent nt exchanges (first and second nt, first and third nt, second and third nt). Additionally, transitions and transversions mutations were compared at the level of amino acid substitution patterns. The latter analysis showed that single nucleotide substitution in a codon generates only 39.5% of the natural diversity on the protein level with 5.2–7 amino acid substitutions per codon. Transversions generate more complex amino acid substitution patterns (increased number and chemically more diverse amino acid substitutions) than transitions. Simultaneous nt exchanges at both first and second nt of a codon generates very diverse amino acid substitution patterns, achieving 83.2% of the natural diversity. The statistical analysis described in this review sets the objectives for novel random mutagenesis methods that address the consequences of the organization of the genetic code. Random mutagenesis methods that favor transversions or introduce consecutive nt exchanges can contribute in this regard.  相似文献   

13.

Background  

The detection of relationships between a protein sequence of unknown function and a sequence whose function has been characterised enables the transfer of functional annotation. However in many cases these relationships can not be identified easily from direct comparison of the two sequences. Methods which compare sequence profiles have been shown to improve the detection of these remote sequence relationships. However, the best method for building a profile of a known set of sequences has not been established. Here we examine how the type of profile built affects its performance, both in detecting remote homologs and in the resulting alignment accuracy. In particular, we consider whether it is better to model a protein superfamily using a single structure-based alignment that is representative of all known cases of the superfamily, or to use multiple sequence-based profiles each representing an individual member of the superfamily.  相似文献   

14.
The phi-screen, a method of phylogenetic screening, can be employed to detect repetitive sequence families that differentially hybridize between closely related species. Such differences may involve sequence divergence or variations in copy number, including total presence versus absence of a family of repeated DNA. We present the results of a phi-screen comparing the human genome to that of the prosimian, Galago crassicaudatus. Three human repetitive families that are divergent or not present in galago have been detected. One of these families is described in detail; it is similar among the anthropoids but is present in a lower copy number and/or divergent form in prosimians. The family is clearly related to the transposon-like human element (THE) described by Paulson et al. (1985). THEs have long terminal repeats reminiscent of retroviruses but are unique in that they have no sequence similarity to known mammalian retroviruses. The sequence of a solo long terminal repeat, found unassociated with THE internal sequence, is presented. This family member, THE p2, is bordered by a 5-bp target-site repeat and is interrupted by the insertion of an Alu element. A solo THE element sequenced by Wiginton et al. (1986) contains an insertion of Alu at precisely the same position as does THE p2.   相似文献   

15.
Naturally occurring proteins comprise a special subset of all plausible sequences and structures selected through evolution. Simulating protein evolution with simplified and all-atom models has shed light on the evolutionary dynamics of protein populations, the nature of evolved sequences and structures, and the extent to which today's proteins are shaped by selection pressures on folding, structure and function. Extensive mapping of the native structure, stability and folding rate in sequence space using lattice proteins has revealed organizational principles of the sequence/structure map important for evolutionary dynamics. Evolutionary simulations with lattice proteins have highlighted the importance of fitness landscapes, evolutionary mechanisms, population dynamics and sequence space entropy in shaping the generic properties of proteins. Finally, evolutionary-like simulations with all-atom models, in particular computational protein design, have helped identify the dominant selection pressures on naturally occurring protein sequences and structures.  相似文献   

16.
Some widely used standard protocols for the separation of phenylthiohydantoin amino acid derivatives by reverse-phase gradient HPLC do not provide separation of the phenylthiohydantoin derivative of tryptophan (PTH-Trp) from diphenylurea (DPU), a by-product generated during Edman degradation of proteins in variable amounts. Furthermore, PTH-Trp is usually recovered in low yield under typical experimental conditions used with automated sequencing equipment. These factors may compromise the unambiguous assignment of tryptophan residues in automated protein sequence analysis, especially when sequencing is performed at high sensitivity. We devised a reverse-phase HPLC method which allows the separation of DPU and PTH-Trp and therefore the correct assignment of PTH-Trp. The method is based on a modification of the HPLC gradient used to elute and separate all PTH amino acids of interest. With Applied Biosystems Model 477A protein sequencers with on-line PTH amino acid identification, the correct assignment of tryptophan was consistent and reproducible even when sequencing at very high sensitivity (5 pmol).  相似文献   

17.
Tandem mass spectrometry (MS/MS) combined with database searching is currently the most widely used method for high-throughput peptide and protein identification. Many different algorithms, scoring criteria, and statistical models have been used to identify peptides and proteins in complex biological samples, and many studies, including our own, describe the accuracy of these identifications, using at best generic terms such as "high confidence." False positive identification rates for these criteria can vary substantially with changing organisms under study, growth conditions, sequence databases, experimental protocols, and instrumentation; therefore, study-specific methods are needed to estimate the accuracy (false positive rates) of these peptide and protein identifications. We present and evaluate methods for estimating false positive identification rates based on searches of randomized databases (reversed and reshuffled). We examine the use of separate searches of a forward then a randomized database and combined searches of a randomized database appended to a forward sequence database. Estimated error rates from randomized database searches are first compared against actual error rates from MS/MS runs of known protein standards. These methods are then applied to biological samples of the model microorganism Shewanella oneidensis strain MR-1. Based on the results obtained in this study, we recommend the use of use of combined searches of a reshuffled database appended to a forward sequence database as a means providing quantitative estimates of false positive identification rates of peptides and proteins. This will allow researchers to set criteria and thresholds to achieve a desired error rate and provide the scientific community with direct and quantifiable measures of peptide and protein identification accuracy as opposed to vague assessments such as "high confidence."  相似文献   

18.
A novel approach for evaluation of sequence relatedness via a network over the sequence space is presented. This relatedness is quantified by graph theoretical techniques. The graph is perceived as a flow network, and flow algorithms are applied. The number of independent pathways between nodes in the network is shown to reflect structural similarity of corresponding protein fragments. These results provide an appropriate parameter for quantitative estimation of such relatedness, as well as reliability of the prediction. They also demonstrate a new potential for sequence analysis and comparison by means of the flow network in the sequence space.  相似文献   

19.
Evolutionary protein engineering has been dramatically successful, producing a wide variety of new proteins with altered stability, binding affinity, and enzymatic activity. However, the success of such procedures is often unreliable, and the impact of the choice of protein, engineering goal, and evolutionary procedure is not well understood. We have created a framework for understanding aspects of the protein engineering process by computationally mapping regions of feasible sequence space for three small proteins using structure-based design protocols. We then tested the ability of different evolutionary search strategies to explore these sequence spaces. The results point to a non-intuitive relationship between the error-prone PCR mutation rate and the number of rounds of replication. The evolutionary relationships among feasible sequences reveal hub-like sequences that serve as particularly fruitful starting sequences for evolutionary search. Moreover, genetic recombination procedures were examined, and tradeoffs relating sequence diversity and search efficiency were identified. This framework allows us to consider the impact of protein structure on the allowed sequence space and therefore on the challenges that each protein presents to error-prone PCR and genetic recombination procedures.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号