首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Frenkel ZM  Trifonov EN 《Proteins》2007,67(2):271-284
A new method is proposed to reveal apparent evolutionary relationships between protein fragments with similar 3D structures by finding "intermediate" sequences in the proteomic database. Instead of looking for homologies and intermediates for a whole protein domain, we build a chain of intermediate short sequences, which allows one to link similar structural modules of proteins belonging to the same or different families. Several such chains of intermediates can be combined into an evolutionary tree of structural protein modules. All calculations were made for protein fragments of 20 aa residues. Three evolutionary trees for different module structures are described. The aim of the paper is to introduce the new method and to demonstrate its potential for protein structural predictions. The approach also opens new perspectives for protein evolution studies.  相似文献   

2.
Shachar O  Linial M 《Proteins》2004,57(3):531-538
With currently available sequence data, it is feasible to conduct extensive comparisons among large sets of protein sequences. It is still a much more challenging task to partition the protein space into structurally and functionally related families solely based on sequence comparisons. The ProtoNet system automatically generates a treelike classification of the whole protein space. It stands to reason that this classification reflects evolutionary relationships, both close and remote. In this article, we examine this hypothesis. We present a semiautomatic procedure that singles out certain inner nodes in the ProtoNet tree that should ideally correspond to structurally and functionally defined protein families. We compare the performance of this method against several expert systems. Some of the competing methods incorporate additional extraneous information on protein structure or on enzymatic activities. The ProtoNet-based method performs at least as well as any of the methods with which it was compared. This article illustrates the ProtoNet-based method on several evolutionarily diverse families. Using this new method, an evolutionary divergence scheme can be proposed for a large number of structural and functional related superfamilies.  相似文献   

3.
4.
Exobiology, the study of the origin, evolution and distribution of life (including life on earth) within the context of cosmic evolution, is being given a remarkable boost by genome sequencing projects, which are now making the evolutionary histories of protein families routinely available. These histories comprise a multiple alignment for their protein sequences and the corresponding DNA sequences, an evolutionary tree showing the pedigree of these sequences, and reconstructed ancestral sequences for each node in the tree. In a post-genomic world having genomic sequences from an unlimited number of organisms, these histories will be used to connect structure, chemical reactivity, and physiological function to these families. This paper describes several “post-genomic” tools that exploit these evolutionary histories. They can be used to confirm or deny long distance homology between two protein families, identify proteins within a family that have new functions, and identify specific in vitro properties of the protein that are important for its physiological role. Evolution-based data structures for organizing large sequence databases are also described.  相似文献   

5.
The Molecular Evolutionary Genetics Analysis (MEGA) software is a desktop application designed for comparative analysis of homologous gene sequences either from multigene families or from different species with a special emphasis on inferring evolutionary relationships and patterns of DNA and protein evolution. In addition to the tools for statistical analysis of data, MEGA provides many convenient facilities for the assembly of sequence data sets from files or web-based repositories, and it includes tools for visual presentation of the results obtained in the form of interactive phylogenetic trees and evolutionary distance matrices. Here we discuss the motivation, design principles and priorities that have shaped the development of MEGA. We also discuss how MEGA might evolve in the future to assist researchers in their growing need to analyze large data set using new computational methods.  相似文献   

6.
Phylogeny as a guide to structure and function of membrane transport proteins   总被引:10,自引:0,他引:10  
Protein phylogeny, based on primary amino acid sequence relatedness, reflects the evolutionary process and therefore provides a guide to structure, mechanism and function. Any two proteins that are related by common descent are expected to exhibit similar structures and functions to a degree proportional to the degree of their sequence similarity; but two independently evolving proteins should not. This principle provides the impetus to define protein phylogenetic relationships and interrelate families when possible. In this mini-review, we summarize the computational approaches and criteria we use to establish common evolutionary origin. We apply these tools to define distant superfamily relationships between several previously recognized transport protein families. In some cases, available structural and functional data are evaluated in order to substantiate our claim that molecular phylogeny provides a reliable guide to protein structure and function.  相似文献   

7.
Single-cell Hi-C (scHi-C) sequencing technologies allow us to investigate three-dimensional chromatin organization at the single-cell level. However, we still need computational tools to deal with the sparsity of the contact maps from single cells and embed single cells in a lower-dimensional Euclidean space. This embedding helps us understand relationships between the cells in different dimensions, such as cell-cycle dynamics and cell differentiation. We present an open-source computational toolbox, scHiCTools, for analyzing single-cell Hi-C data comprehensively and efficiently. The toolbox provides two methods for screening single cells, three common methods for smoothing scHi-C data, three efficient methods for calculating the pairwise similarity of cells, three methods for embedding single cells, three methods for clustering cells, and a build-in function to visualize the cells embedding in a two-dimensional or three-dimensional plot. scHiCTools, written in Python3, is compatible with different platforms, including Linux, macOS, and Windows.  相似文献   

8.
9.
The Cerithioidea is a very diverse group of gastropods with ca. 14 extant families and more than 200 genera occupying, and often dominating, marine, estuarine, and freshwater habitats. While the composition of Cerithioidea is now better understood due to recent anatomical and ultrastructural studies, the phylogenetic relationships among families remain chaotic. Morphology-based studies have provided conflicting views of relationships among families. We generated a phylogeny of cerithioideans based on mitochondrial large subunit rRNA and flanking tRNA gene sequences (total aligned data set 1873 bp). Nucleotide evidence and the presence of a unique pair of tRNA genes (i.e., threonine + glycine) between valine-mtLSU and the mtSSU rRNA gene support conclusions based on ultrastructural data that Vermetidae and Campanilidae are not Cerithioidea, certain anatomical similarities being due to convergent evolution. The molecular phylogeny shows support for the monophyly of the marine families Cerithiidae [corrected], Turritellidae, Batillariidae, Potamididae, and Scaliolidae as currently recognized. The phylogenetic data reveal that freshwater taxa evolved on three separate occasions; however, all three recognized freshwater families (Pleuroceridae, Melanopsidae, and Thiaridae) are polyphyletic. Mitochondrial rDNA sequences provide valuable data for testing the monophyly of cerithioidean [corrected] families and relationships within families, but fail to provide strong evidence for resolving relationships among families. It appears that the deepest phylogenetic limits for resolving caenogastropod relationships is less than about 245--241 mya, based on estimates of divergence derived from the fossil record.  相似文献   

10.
Virtually every molecular biologist has searched a protein or DNA sequence database to find sequences that are evolutionarily related to a given query. Pairwise sequence comparison methods--i.e., measures of similarity between query and target sequences--provide the engine for sequence database search and have been the subject of 30 years of computational research. For the difficult problem of detecting remote evolutionary relationships between protein sequences, the most successful pairwise comparison methods involve building local models (e.g., profile hidden Markov models) of protein sequences. However, recent work in massive data domains like web search and natural language processing demonstrate the advantage of exploiting the global structure of the data space. Motivated by this work, we present a large-scale algorithm called ProtEmbed, which learns an embedding of protein sequences into a low-dimensional "semantic space." Evolutionarily related proteins are embedded in close proximity, and additional pieces of evidence, such as 3D structural similarity or class labels, can be incorporated into the learning process. We find that ProtEmbed achieves superior accuracy to widely used pairwise sequence methods like PSI-BLAST and HHSearch for remote homology detection; it also outperforms our previous RankProp algorithm, which incorporates global structure in the form of a protein similarity network. Finally, the ProtEmbed embedding space can be visualized, both at the global level and local to a given query, yielding intuition about the structure of protein sequence space.  相似文献   

11.
Small heat shock proteins (sHSPs), as one subclass of molecular chaperones, are important for cells to protect proteins under stress conditions. Unlike the large HSPs (represented by Hsp60 and Hsp70), sHSPs are highly divergent in both primary sequences and oligomeric status, with their evolutionary relationships being unresolved. Here the phylogenetic analysis of a representative 51 sHSPs (covering the six subfamilies: bacterial class A, bacterial class B, archae, fungi, plant, and animal) reveals a close relationship between bacterial class A and animal sHSPs which form an outgroup. Accumulating data indicate that the oligomers from bacterial class A and animal sHSPs appear to exhibit polydispersity, while those from the rest exhibit monodispersity. Together, the close evolutionary relationship and the similarity in oligomeric polydispersity between bacterial class A and animal sHSPs not only suggest a potential evolutionary origin of the latter from the former, but also imply that their oligomeric polydispersity is somehow a property determined by their primary sequences. [Reviewing Editor: Dr. Martin Kreitman]  相似文献   

12.
Statistical and learning techniques are becoming increasingly popular for different tasks in bioinformatics. Many of the most powerful statistical and learning techniques are applicable to points in a Euclidean space but not directly applicable to discrete sequences such as protein sequences. One way to apply these techniques to protein sequences is to embed the sequences into a Euclidean space and then apply these techniques to the embedded points. In this work we introduce a biologically motivated sequence embedding, the homology kernel, which takes into account intuitions from local alignment, sequence homology, and predicted secondary structure. This embedding allows us to directly apply learning techniques to protein sequences. We apply the homology kernel in several ways. We demonstrate how the homology kernel can be used for protein family classification and outperforms state-of-the-art methods for remote homology detection. We show that the homology kernel can be used for secondary structure prediction and is competitive with popular secondary structure prediction methods. Finally, we show how the homology kernel can be used to incorporate information from homologous sequences in local sequence alignment.  相似文献   

13.
Summary In a recent report mouse B1 genomic repeats were divided into six families representing different waves of fixation of B1 variants, consistent with the retroposition model of human Alu elements. These data are used to examine the distribution of nucleotide substitutions in individual genomic repeats with respect to family consensus sequences and to compare the minimal energy structures of the corresponding B1 RNAs. By an enzymatic approach the predicted structure of B1 RNAs is experimentally confirmed using as a model sequence an RNA of a young B1 family member transcribed in vitro by T7 RNA polymerase. B1 RNA preserves folding domains of the Alu fragment of 7SL RNA, its progenitor molecule. Our results reveal similarities among 7SL-like retroposons, human Alu, and rodent B1 repeats, and relate the evolutionary conservation of B1 family consensus sequences to selection at the RNA level.  相似文献   

14.
The Molecular Evolution of the Small Heat-Shock Proteins in Plants   总被引:13,自引:0,他引:13       下载免费PDF全文
E. R. Waters 《Genetics》1995,141(2):785-795
The small heat-shock proteins have undergone a tremendous diversification in plants; whereas only a single small heat-shock protein is found in fungi and many animals, over 20 different small heat-shock proteins are found in higher plants. The small heat-shock proteins in plants have diversified in both sequence and cellular localization and are encoded by at least five gene families. In this study, 44 small heat-shock protein DNA and amino acid sequences were examined, using both phylogenetic analysis and analysis of nucleotide substitution patterns to elucidate the evolutionary history of the small heat-shock proteins. The phylogenetic relationships of the small heat-shock proteins, estimated using parsimony and distance methods, reveal that gene duplication, sequence divergence and gene conversion have all played a role in the evolution of the small heat-shock proteins. Analysis of nonsynonymous substitutions and conservative and radical replacement substitutions (in relation to hydrophobicity) indicates that the small heat-shock protein gene families are evolving at different rates. This suggests that the small heat-shock proteins may have diversified in function as well as in sequence and cellular localization.  相似文献   

15.
Phylogenetic relationships among salamander families illustrate analytical challenges inherent to inferring phylogenies in which terminal branches are temporally very long relative to internal branches. We present new mitochondrial DNA sequences, approximately 2,100 base pairs from the genes encoding ND1, ND2, COI, and the intervening tRNA genes for 34 species representing all 10 salamander families, to examine these relationships. Parsimony analysis of these mtDNA sequences supports monophyly of all families except Proteidae, but yields a tree largely unresolved with respect to interfamilial relationships and the phylogenetic positions of the proteid genera Necturus and Proteus. In contrast, Bayesian and maximum-likelihood analyses of the mtDNA data produce a topology concordant with phylogenetic results from nuclear-encoded rRNA sequences, and they statistically reject monophyly of the internally fertilizing salamanders, suborder Salamandroidea. Phylogenetic simulations based on our mitochondrial DNA sequences reveal that Bayesian analyses outperform parsimony in reconstructing short branches located deep in the phylogenetic history of a taxon. However, phylogenetic conflicts between our results and a recent analysis of nuclear RAG-1 gene sequences suggest that statistical rejection of a monophyletic Salamandroidea by Bayesian analyses of our mitochondrial genomic data is probably erroneous. Bayesian and likelihood-based analyses may overestimate phylogenetic precision when estimating short branches located deep in a phylogeny from data showing substitutional saturation; an analysis of nucleotide substitutions indicates that these methods may be overly sensitive to a relatively small number of sites that show substitutions judged uncommon by the favored evolutionary model.  相似文献   

16.
We present an analysis of 203 completed genomes in the Gene3D resource (including 17 eukaryotes), which demonstrates that the number of protein families is continually expanding over time and that singleton-sequences appear to be an intrinsic part of the genomes. A significant proportion of the proteomes can be assigned to fewer than 6000 well-characterized domain families with the remaining domain-like regions belonging to a much larger number of small uncharacterized families that are largely species specific. Our comprehensive domain annotation of 203 genomes enables us to provide more accurate estimates of the number of multi-domain proteins found in the three kingdoms of life than previous calculations. We find that 67% of eukaryotic sequences are multi-domain compared with 56% of sequences in prokaryotes. By measuring the domain coverage of genome sequences, we show that the structural genomics initiatives should aim to provide structures for less than a thousand structurally uncharacterized Pfam families to achieve reasonable structural annotation of the genomes. However, in large families, additional structures should be determined as these would reveal more about the evolution of the family and enable a greater understanding of how function evolves.  相似文献   

17.
Now that large-scale genome-sequencing projects are sampling many organismal lineages, it is becoming possible to compare large data sets of not only DNA and protein sequences, but also genome-level features, such as gene arrangements and the positions of mobile genetic elements. Although it is unlikely that comparisons of such features will address a large number of evolutionary branch points across the broad tree of life owing to the infeasibility of such sampling, they have great potential for resolving many crucial, contested relationships for which no other data seem promising. Here, I discuss the advancements, advantages, methods, and problems of the use of genome-level characters for reconstructing evolutionary relationships.  相似文献   

18.
BACKGROUND: Proteins from thermophilic organisms usually show high intrinsic thermal stability but have structures that are very similar to their mesophilic homologues. From prevous studies it is difficult to draw general conclusions about the structural features underlying the increased thermal stability of thermophilic proteins. RESULTS: In order to reveal the general evolutionary strategy for changing the heat stability of proteins, a non-redundant data set was compiled comprising all high-quality structures of thermophilic proteins and their mesophilic homologues from the Protein Data Bank. The selection (quality) criteria were met by 64 mesophilic and 29 thermophilic protein subunits, representing 25 protein families. From the atomic coordinates, 13 structural parameters were calculated, compared and evaluated using statistical methods. This study is distinguished from earlier ones by the strict quality control of the structures used and the size of the data set. CONCLUSIONS: Different protein families adapt to higher temperatures by different sets of structural devices. Regarding the structural parameters, the only generally observed rule is an increase in the number of ion pairs with increasing growth temperature. Other parameters show just a trend, whereas the number of hydrogen bonds and the polarity of buried surfaces exhibit no clear-cut tendency to change with growth temperature. Proteins from extreme thermophiles are stabilized in different ways to moderately thermophilic ones. The preferences of these two groups are different with regards to the number of ion pairs, the number of cavities, the polarity of exposed surface and the secondary structural composition.  相似文献   

19.
Paul Mach  Patrice Koehl 《Proteins》2013,81(9):1556-1570
It is well known that protein fold recognition can be greatly improved if models for the underlying evolution history of the folds are taken into account. The improvement, however, exists only if such evolutionary information is available. To circumvent this limitation for protein families that only have a small number of representatives in current sequence databases, we follow an alternate approach in which the benefits of including evolutionary information can be recreated by using sequences generated by computational protein design algorithms. We explore this strategy on a large database of protein templates with 1747 members from different protein families. An automated method is used to design sequences for these templates. We use the backbones from the experimental structures as fixed templates, thread sequences on these backbones using a self‐consistent mean field approach, and score the fitness of the corresponding models using a semi‐empirical physical potential. Sequences designed for one template are translated into a hidden Markov model‐based profile. We describe the implementation of this method, the optimization of its parameters, and its performance. When the native sequences of the protein templates were tested against the library of these profiles, the class, fold, and family memberships of a large majority (>90%) of these sequences were correctly recognized for an E‐value threshold of 1. In contrast, when homologous sequences were tested against the same library, a much smaller fraction (35%) of sequences were recognized; The structural classification of protein families corresponding to these sequences, however, are correctly recognized (with an accuracy of >88%). Proteins 2013; © 2013 Wiley Periodicals, Inc.  相似文献   

20.
Protein interactions are fundamental to the functioning of cells, and high throughput experimental and computational strategies are sought to map interactions. Predicting interaction specificity, such as matching members of a ligand family to specific members of a receptor family, is largely an unsolved problem. Here we show that by using evolutionary relationships within such families, it is possible to predict their physical interaction specificities. We introduce the computational method of matrix alignment for finding the optimal alignment between protein family similarity matrices. A second method, 3D embedding, allows visualization of interacting partners via spatial representation of the protein families. These methods essentially align phylogenetic trees of interacting protein families to define specific interaction partners. Prediction accuracy depends strongly on phylogenetic tree complexity, as measured with information theoretic methods. These results, along with simulations of protein evolution, suggest a model for the evolution of interacting protein families in which interaction partners are duplicated in coupled processes. Using these methods, it is possible to successfully find protein interaction specificities, as demonstrated for >18 protein families.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号