首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
From protein sequence space to elementary protein modules   总被引:2,自引:0,他引:2  
Frenkel ZM  Trifonov EN 《Gene》2008,408(1-2):64-71
The formatted protein sequence space is built from identical size fragments of prokaryotic proteins (112 complete proteomes). Connecting sequence-wise similar fragments (points in the space) results in the formation of numerous networks, that combine sometimes different types of proteins sharing, though, fragments with similar or distantly related sequences. The networks are mapped on individual protein sequences revealing distinct regions (modules) associated with prominent networks with well-defined functional identities. Presence of multiple sites of sequence conservation (modules) in a given protein sequence suggests that the annotated protein function may be decomposed in "elementary" subfunctions of the respective modules. The modules correspond to previously discovered conserved closed loop structures and their sequence prototypes.  相似文献   

2.
Following the original idea of Maynard Smith on evolution of the protein sequence space, a novel tool is developed that allows the "space walk", from one sequence to its likely evolutionary relative and further on. At a given threshold of identity between consecutive steps, the walks of many steps are possible. The sequences at the ends of the walks may substantially differ from one another. In a sequence space of randomized (shuffled) sequences the walks are very short. The approach opens new perspectives for protein evolutionary studies and sequence annotation.  相似文献   

3.
LAGLIDADG homing endonucleases (LHEs) are a family of highly specific DNA endonucleases capable of recognizing target sequences ≈ 20 bp in length, thus drawing intense interest for their potential academic, biotechnological and clinical applications. Methods for rational design of LHEs to cleave desired target sites are presently limited by a small number of high-quality native LHEs to serve as scaffolds for protein engineering-many are unsatisfactory for gene targeting applications. One strategy to address such limitations is to identify close homologs of existing LHEs possessing superior biophysical or catalytic properties. To test this concept, we searched public sequence databases to identify putative LHE open reading frames homologous to the LHE I-AniI and used a DNA binding and cleavage assay using yeast surface display to rapidly survey a subset of the predicted proteins. These proteins exhibited a range of capacities for surface expression and also displayed locally altered binding and cleavage specificities with a range of in vivo cleavage activities. Of these enzymes, I-HjeMI demonstrated the greatest activity in vivo and was readily crystallizable, allowing a comparative structural analysis. Taken together, our results suggest that even highly homologous LHEs offer a readily accessible resource of related scaffolds that display diverse biochemical properties for biotechnological applications.  相似文献   

4.
Spreadsheet languages have gained enormous popularity for simplenumerical computations, but their power in non-numerical contextshas not been widely recognized. This paper presents spreadsheetimplementations of several familiar sequence analysis tools.In many cases, spreadsheet notation clarifies the underlyingalgorithm, facilitating understanding of existing algorithmsand promoting spontaneous experimentation. Spreadsheets alsoreveal, through their visible parallelism, opportunities forapplying parallel processors to computationally challengingproblems. Received on November 16, 1986; accepted on June 10, 1987  相似文献   

5.
A novel method to link a nascent protein (phenotype) to its mRNA (genotype) covalently through the N-terminus was developed. The mRNA harboring amber stop codon at just downstream of initiation site was hybridized with hydrazide-modified ssDNA at upstream of coding region and was ligated to the DNA. This construct was then modified with 4-acetyl-phenylalanyl amber suppressor tRNA. This modified construct was fused with the nascent protein via the phenylalanine derivative when the mRNA uses the amber suppressor tRNA to decode the amber stop codon. The obtained fusion molecule was used successfully in selective enrichment experiments. It will be applicable for high-through-put screening in evolutionary protein engineering. In contrast to fusion molecules generated by other methods in which the protein is linked to genotype molecule through the C- terminus, our fusion molecule will serve to select a protein for which the C-terminus is essential to be active.  相似文献   

6.
Knowledge-based potentials can be used to decide whether an amino acid sequence is likely to fold into a prescribed native protein structure. We use this idea to survey the sequence-structure relations in protein space. In particular, we test the following two propositions which were found to be important for efficient evolution: the sequences folding into a particular native fold form extensive neutral networks that percolate through sequence space. The neutral networks of any two native folds approach each other to within a few point mutations. Computer simulations using two very different potential functions, M. Sippl's PROSA pair potential and a neural network based potential, are used to verify these claims.  相似文献   

7.
8.
Pellegrini M  Yeates TO 《Proteins》1999,37(2):278-283
The protein sequence database was analyzed for evidence that some distinct sequence families might be distantly related in evolution by changes in frame of translation. Sequences were compared using special amino acid substitution matrices for the alternate frames of translation. The statistical significance of alignment scores were computed in the true database and shuffled versions of the database that preserve any potential codon bias. The comparison of results from these two databases provides a very sensitive method for detecting remote relationships. We find a weak but measurable relatedness within the database as a whole, supporting the notion that some proteins may have evolved from others through changes in frame of translation. We also quantify residual homology in the ordinary sense within a database of generally unrelated sequences.  相似文献   

9.
Genes and genomes do not evolve similarly in all branches of the tree of life. Detecting and characterizing the heterogeneity in time, and between lineages, of the nucleotide (or amino acid) substitution process is an important goal of current molecular evolutionary research. This task is typically achieved through the use of non-homogeneous models of sequence evolution, which being highly parametrized and computationally-demanding are not appropriate for large-scale analyses. Here we investigate an alternative methodological option based on probabilistic substitution mapping. The idea is to first reconstruct the substitutional history of each site of an alignment under a homogeneous model of sequence evolution, then to characterize variations in the substitution process across lineages based on substitution counts. Using simulated and published datasets, we demonstrate that probabilistic substitution mapping is robust in that it typically provides accurate reconstruction of sequence ancestry even when the true process is heterogeneous, but a homogeneous model is adopted. Consequently, we show that the new approach is essentially as efficient as and extremely faster than (up to 25 000 times) existing methods, thus paving the way for a systematic survey of substitution process heterogeneity across genes and lineages.  相似文献   

10.
Genome-wide studies in Saccharomyces cerevisiae concluded that the dominant determinant of protein evolutionary rates is expression level: highly expressed proteins generally evolve most slowly. To determine how this constraint affects the evolution of protein interactions, we directly measure evolutionary rates of protein interface, surface, and core residues by structurally mapping domain interactions to yeast genomes. We find that mRNA level and protein abundance, though correlated, report on pressures affecting regions of proteins differently. Pressures proportional to mRNA level slow evolutionary rates of all structural regions and reduce the variability in rate differences between interfaces and other surfaces. In contrast, the evolutionary rate variation within a domain is much less correlated to protein abundance. Distinct pressures may be associated primarily with the cost (mRNA level) and functional (protein abundance) benefit of protein production. Interfaces of proteins with low mRNA levels may have higher evolutionary flexibility and could constitute the raw material for new functions.  相似文献   

11.
Naturally occurring proteins comprise a special subset of all plausible sequences and structures selected through evolution. Simulating protein evolution with simplified and all-atom models has shed light on the evolutionary dynamics of protein populations, the nature of evolved sequences and structures, and the extent to which today's proteins are shaped by selection pressures on folding, structure and function. Extensive mapping of the native structure, stability and folding rate in sequence space using lattice proteins has revealed organizational principles of the sequence/structure map important for evolutionary dynamics. Evolutionary simulations with lattice proteins have highlighted the importance of fitness landscapes, evolutionary mechanisms, population dynamics and sequence space entropy in shaping the generic properties of proteins. Finally, evolutionary-like simulations with all-atom models, in particular computational protein design, have helped identify the dominant selection pressures on naturally occurring protein sequences and structures.  相似文献   

12.
Evolutionary networks in the formatted protein sequence space.   总被引:4,自引:0,他引:4  
In our recent work, a new approach to establish sequence relatedness, by walking through the protein sequence space, was introduced. The sequence space is built from 20 amino acid long fragments of proteins from a very large collection of fully sequenced prokaryotic genomes. The fragments, points in the space, are connected, if they are closely related (high sequence identity). The connected fragments form variety of networks of sequence kinship. In this research the networks in the formatted sequence space and their topology are analyzed. For lower identity thresholds a huge network of complex structure is formed, involving up to 10% points of the space. When the threshold is increased, the major network splits into a set of smaller clusters with a wide diversity of sizes and topologies. Such "evolutionary networks" may serve as a powerful sequence annotation tool that allows one to reveal fine details in the evolutionary history of proteins.  相似文献   

13.
A novel approach for evaluation of sequence relatedness via a network over the sequence space is presented. This relatedness is quantified by graph theoretical techniques. The graph is perceived as a flow network, and flow algorithms are applied. The number of independent pathways between nodes in the network is shown to reflect structural similarity of corresponding protein fragments. These results provide an appropriate parameter for quantitative estimation of such relatedness, as well as reliability of the prediction. They also demonstrate a new potential for sequence analysis and comparison by means of the flow network in the sequence space.  相似文献   

14.

Background

It is a major challenge of computational biology to provide a comprehensive functional classification of all known proteins. Most existing methods seek recurrent patterns in known proteins based on manually-validated alignments of known protein families. Such methods can achieve high sensitivity, but are limited by the necessary manual labor. This makes our current view of the protein world incomplete and biased. This paper concerns ProtoNet, a automatic unsupervised global clustering system that generates a hierarchical tree of over 1,000,000 proteins, based solely on sequence similarity.

Results

In this paper we show that ProtoNet correctly captures functional and structural aspects of the protein world. Furthermore, a novel feature is an automatic procedure that reduces the tree to 12% its original size. This procedure utilizes only parameters intrinsic to the clustering process. Despite the substantial reduction in size, the system's predictive power concerning biological functions is hardly affected. We then carry out an automatic comparison with existing functional protein annotations. Consequently, 78% of the clusters in the compressed tree (5,300 clusters) get assigned a biological function with a high confidence. The clustering and compression processes are unsupervised, and robust.

Conclusions

We present an automatically generated unbiased method that provides a hierarchical classification of all currently known proteins.
  相似文献   

15.
16.
17.
By its purest definition the ultimate goal of structural genomics (SG) is the determination of the structures of all proteins encoded by genomes. Most of these will be obtained by homology modeling using the structures of a set of target proteins for experimental determination. Thanks to the open exchange of SG target information, we are able to analyze the sequences of the current target list to evaluate the extent of its coverage of protein sequence space. The presence of homologous sequences currently either in the Protein Data Bank (PDB) or among SG targets has been determined for each of the protein sequences in several organisms. In this way we are able to evaluate the coverage by existing or targeted structural data for the non-membranous parts of entire proteomes. For small bacterial proteomes such as that of H. influenzae almost all proteins have homologous sequences among SG targets or in the PDB. There is significantly lower coverage for more complex organisms, such as C. elegans. We have mapped the SG target list onto the ProtoMap clustering of protein sequences. Clusters occupied by SG targets represent over 150,000 protein sequences, which is approximately 44% of the total protein sequences classified by ProtoMap. The mapping of SG targets also enables an evaluation of the degree of overlap within the target list. An SG target typically occupies a ProtoMap cluster with more than six other homologous targets.  相似文献   

18.
cDNA encoding the adhesive protein of the musselMytilus coruscus (Mgfpl) was isolated. The coding region encoded 848 amino acids (a.a.) comprising the 20-a.a. signal peptide, the 21-a.a. nonrepetitive linker, and the 805-a.a. repetitive domain. Although the first 204 nucleotides and the 3′-untranslated region of Mgfpl cDNA were homologous to corresponding parts ofM. galloprovincialis adhesive protein (Mgfpl) cDNA, the other parts diverged. The representative repeat motif of the repetitive domain, YKPK(1/P)(S/T)YPP(T/S), was similar but slightly different from the repeat motif of Mgfpl. The codon usage patterns for the same amino acids were different in different positions of the decapeptide motif. Almost identical nucleotide sequences encoding the two to 13 repeats appeared several times in the repetitive region, which suggests that the adhesive protein genes of mussels have evolved through the duplication of these repeat units. The nucleotide sequence data reported in this paper will appear in the GSDB, DDBJ, EMBL, and NCBI nucleotide sequence databases with the accession number D63777 Correspondence to: K. Inoue  相似文献   

19.
20.
To refine the location of a disease gene within the bounds provided by linkage analysis, many scientists use the pattern of linkage disequilibrium between the disease allele and alleles at nearby markers. We describe a method that seeks to refine location by analysis of "disease" and "normal" haplotypes, thereby using multivariate information about linkage disequilibrium. Under the assumption that the disease mutation occurs in a specific gap between adjacent markers, the method first combines parsimony and likelihood to build an evolutionary tree of disease haplotypes, with each node (haplotype) separated, by a single mutational or recombinational step, from its parent. If required, latent nodes (unobserved haplotypes) are incorporated to complete the tree. Once the tree is built, its likelihood is computed from probabilities of mutation and recombination. When each gap between adjacent markers is evaluated in this fashion and these results are combined with prior information, they yield a posterior probability distribution to guide the search for the disease mutation. We show, by evolutionary simulations, that an implementation of these methods, called "FineMap," yields substantial refinement and excellent coverage for the true location of the disease mutation. Moreover, by analysis of hereditary hemochromatosis haplotypes, we show that FineMap can be robust to genetic heterogeneity.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号