首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
微生物系统发育多样性及其保护生物学意义   总被引:13,自引:2,他引:11  
近年来,分子系统发育分析方法,特别是rBNA基因同源性分析方法,在微生物多样性的研究中发挥着越来越重要的作用.它克服了传统的微生物分离培养方法的限制,极大地促进了人们对微生物多样性的理解.在遗传信息同源性分析基础上得出的微生物系统发育多样性为多样性的保护提供了新的视点.它不但可以作为微生物多样性评价的手段,而且为多样性保护中优先秩序的确定提供了依据.同时也为生物多样性保护确定了目标,即最大程度地保护系统发育关系中所包含的信息.本文对微生物系统发育多样性的特点及其保护生物学意义进行了简要评述.  相似文献   

2.
Modeling the inherent flexibility of the protein backbone as part of computational protein design is necessary to capture the behavior of real proteins and is a prerequisite for the accurate exploration of protein sequence space. We present the results of a broad exploration of sequence space, with backbone flexibility, through a novel approach: large-scale protein design to structural ensembles. A distributed computing architecture has allowed us to generate hundreds of thousands of diverse sequences for a set of 253 naturally occurring proteins, allowing exciting insights into the nature of protein sequence space. Designing to a structural ensemble produces a much greater diversity of sequences than previous studies have reported, and homology searches using profiles derived from the designed sequences against the Protein Data Bank show that the relevance and quality of the sequences is not diminished. The designed sequences have greater overall diversity than corresponding natural sequence alignments, and no direct correlations are seen between the diversity of natural sequence alignments and the diversity of the corresponding designed sequences. For structures in the same fold, the sequence entropies of the designed sequences cluster together tightly. This tight clustering of sequence entropies within a fold and the separation of sequence entropy distributions for different folds suggest that the diversity of designed sequences is primarily determined by a structure's overall fold, and that the designability principle postulated from studies of simple models holds in real proteins. This has important implications for experimental protein design and engineering, as well as providing insight into protein evolution.  相似文献   

3.
La D  Kihara D 《Proteins》2012,80(1):126-141
Protein-protein binding events mediate many critical biological functions in the cell. Typically, functionally important sites in proteins can be well identified by considering sequence conservation. However, protein-protein interaction sites exhibit higher sequence variation than other functional regions, such as catalytic sites of enzymes. Consequently, the mutational behavior leading to weak sequence conservation poses significant challenges to the protein-protein interaction site prediction. Here, we present a phylogenetic framework to capture critical sequence variations that favor the selection of residues essential for protein-protein binding. Through the comprehensive analysis of diverse protein families, we show that protein binding interfaces exhibit distinct amino acid substitution as compared with other surface residues. On the basis of this analysis, we have developed a novel method, BindML, which utilizes the substitution models to predict protein-protein binding sites of protein with unknown interacting partners. BindML estimates the likelihood that a phylogenetic tree of a local surface region in a query protein structure follows the substitution patterns of protein binding interface and nonbinding surfaces. BindML is shown to perform well compared to alternative methods for protein binding interface prediction. The methodology developed in this study is very versatile in the sense that it can be generally applied for predicting other types of functional sites, such as DNA, RNA, and membrane binding sites in proteins.  相似文献   

4.
The rugged protein sequence-function landscape complicates efforts, both in nature and in the laboratory, to evolve protein function. Protein library diversification must strike a balance between sufficient variegation to thoroughly sample alternative functionality versus the probability of mutant destabilization below an expressible threshold. In this work, we explore the sequence-function landscape in the context of screening for molecular recognition from an Ig scaffold library. The fibronectin type III domain is used to explore the impact of two sequence diversification strategies: (a) partial wild-type conservation at structurally important positions within the paratope region and (b) tailored amino acid composition mimicking antibody binding-site composition at putative paratope positions. Structurally important positions within the paratope region were identified through stability, structural, and phylogenetic analyses and partially or fully conserved in sequence. To achieve tailored antibody-like diversity, we designed a set of skewed nucleotide mixtures yielding codons approximately matching the distribution observed in antibody complementarity-determining regions without incurring the expense of triphosphoramidite-based construction. These design elements were explored via comparison of three library designs: a random library, a library with wild-type bias in the DE loop only and tyrosine-serine diversity elsewhere, and a library with wild-type bias at 11 positions and the antibody-inspired amino acid distribution. Using pooled libraries for direct competition in a single tube, selection and maturation of binders to seven targets yielded 19 of 21 clones that originated from the structurally biased, tailored-diversity library design. Sequence analysis of the selected clones supports the importance of both tailored compositional diversity and structural bias. In addition, selection of both well and poorly expressed clones from two libraries further elucidated the impact of structural bias.  相似文献   

5.
6.
Bacterial porin proteins allow for the selective movement of hydrophilic solutes through the outer membrane of Gram-negative bacteria. The purpose of this study was to clarify the evolutionary relationships among the Type 1 general bacterial porins (GBPs), a porin protein subfamily that includes outer membrane proteins ompC and ompF among others. Specifically, we investigated the potential utility of phylogenetic analysis for refining poorly annotated or mis-annotated protein sequences in databases, and for characterizing new functionally distinct groups of porin proteins. Preliminary phylogenetic analysis of sequences obtained from GenBank indicated that many of these sequences were incompletely or even incorrectly annotated. Using a well-curated set of porins classified via comparative genomics, we applied recently developed bayesian phylogenetic methods for protein sequence analysis to determine the relationships among the Type 1 GBPs. Our analysis found that the major GBP classes (ompC, phoE, nmpC and ompN) formed strongly supported monophyletic groups, with the exception of ompF, which split into two distinct clades. The relationships of the GBP groups to one another had less statistical support, except for the relationships of ompC and ompN sequences, which were strongly supported as sister groups. A phylogenetic analysis comparing the relationships of the GenBank GBP sequences to the correctly annotated set of GBPs identified a large number of previously unclassified and mis-annotated GBPs. Given these promising results, we developed a tree-parsing algorithm for automated phylogenetic annotation and tested it with GenBank sequences. Our algorithm was able to automatically classify 30 unidentified and 15 mis-annotated GBPs out of 78 sequences. Altogether, our results support the potential for phylogenomics to increase the accuracy of sequence annotations.  相似文献   

7.
A comprehensive, structural and functional, in silico analysis of the medium-chain dehydrogenase/reductase (MDR) superfamily, including 583 proteins, was carried out by use of extensive database mining and the blastp program in an iterative manner to identify all known members of the superfamily. Based on phylogenetic, sequence, and functional similarities, the protein members of the MDR superfamily were classified into three different taxonomic categories: (a) subfamilies, consisting of a closed group containing a set of ideally orthologous proteins that perform the same function; (b) families, each comprising a cluster of monophyletic subfamilies that possess significant sequence identity among them and might share or not common substrates or mechanisms of reaction; and (c) macrofamilies, each comprising a cluster of monophyletic protein families with protein members from the three domains of life, which includes at least one subfamily member that displays activity related to a very ancient metabolic pathway. In this context, a superfamily is a group of homologous protein families (and/or macrofamilies) with monophyletic origin that shares at least a barely detectable sequence similarity, but showing the same 3D fold. The MDR superfamily encloses three macrofamilies, with eight families and 49 subfamilies. These subfamilies exhibit great functional diversity including noncatalytic members with different subcellular, phylogenetic, and species distributions. This results from constant enzymogenesis and proteinogenesis within each kingdom, and highlights the huge plasticity that MDR superfamily members possess. Thus, through evolution a great number of taxa-specific new functions were acquired by MDRs. The generation of new functions fulfilled by proteins, can be considered as the essence of protein evolution. The mechanisms of protein evolution inside MDR are not constrained to conserve substrate specificity and/or chemistry of catalysis. In consequence, MDR functional diversity is more complex than sequence diversity. MDR is a very ancient protein superfamily that existed in the last universal common ancestor. It had at least two (and probably three) different ancestral activities related to formaldehyde metabolism and alcoholic fermentation. Eukaryotic members of this superfamily are more related to bacterial than to archaeal members; horizontal gene transfer among the domains of life appears to be a rare event in modern organisms.  相似文献   

8.
Shih CH  Chang CM  Lin YS  Lo WC  Hwang JK 《Proteins》2012,80(6):1647-1657
The knowledge of conserved sequences in proteins is valuable in identifying functionally or structurally important residues. Generating the conservation profile of a sequence requires aligning families of homologous sequences and having knowledge of their evolutionary relationships. Here, we report that the conservation profile at the residue level can be quantitatively derived from a single protein structure with only backbone information. We found that the reciprocal packing density profiles of protein structures closely resemble their sequence conservation profiles. For a set of 554 nonhomologous enzymes, 74% (408/554) of the proteins have a correlation coefficient > 0.5 between these two profiles. Our results indicate that the three-dimensional structure, instead of being a mere scaffold for positioning amino acid residues, exerts such strong evolutionary constraints on the residues of the protein that its profile of sequence conservation essentially reflects that of its structural characteristics.  相似文献   

9.
We have developed a phylogeny-based design method that has been used to produce mutated proteins with enhanced thermal stabilities. We previously validated the predictive worth of the method by producing and characterizing mutants in which one original residue or a small number of the original residues had been replaced with the one or the ones found in the phylogenetically predicted “ancestral” sequence. For the current study, this method was used to design a sequence for the deepest nodal position of a phylogenic tree composed of 16 gyrase B-subunit sequences, which was then synthesized and characterized. The sequence was inferred from the sequences of 16 extant DNA gyrases and 3 extant type VI DNA topoisomerases. Genes encoding the inferred sequence and its N-terminal ATPase domain were PCR constructed and expressed in Escherichia coli. The full-length designed protein is slightly less thermally stable than is subunit B from the extant thermophilic Thermus thermophilus DNA gyrase, whereas the thermal stability of the designed ATPase domain is more similar to that of the T. thermophilus ATPase domain. Moreover, the designed ATPase domain has significant catalytic activity. Therefore, even a small set of homologous amino acid sequences contains sufficient information to design a thermally stable and functional protein. Because the isolated designed ATPase domain is more thermally stable and catalytically active than is the sequence containing the most frequently occurring amino acids among the 16 gyrases, the phylogenetic approach was superior (in this case, at least) to the consensus approach when the same data set was used to predict the two sequences.  相似文献   

10.
Comparison of ARM and HEAT protein repeats   总被引:18,自引:0,他引:18  
ARM and HEAT motifs are tandemly repeated sequences of approximately 50 amino acid residues that occur in a wide variety of eukaryotic proteins. An exhaustive search of sequence databases detected new family members and revealed that at least 1 in 500 eukaryotic protein sequences contain such repeats. It also rendered the similarity between ARM and HEAT repeats, believed to be evolutionarily related, readily apparent. All the proteins identified in the database searches could be clustered by sequence similarity into four groups: canonical ARM-repeat proteins and three groups of the more divergent HEAT-repeat proteins. This allowed us to build improved sequence profiles for the automatic detection of repeat motifs. Inspection of these profiles indicated that the individual repeat motifs of all four classes share a common set of seven highly conserved hydrophobic residues, which in proteins of known three-dimensional structure are buried within or between repeats. However, the motifs differ at several specific residue positions, suggesting important structural or functional differences among the classes. Our results illustrate that ARM and HEAT-repeat proteins, while having a common phylogenetic origin, have since diverged significantly. We discuss evolutionary scenarios that could account for the great diversity of repeats observed.  相似文献   

11.
The need to protect and preserve biodiversity is a pressing issue and requires that conservation projects be based on solid foundations. Knowledge of species evolutionary history can serve as a tool to help guide conservation projects on the basis of evolutionary heritage. We used communities of Cladocera (Crustacea, Branchiopoda) in urban waterbodies to identify which sites should be prioritized for phylogenetic diversity conservation. Phylogenetic trees were inferred using DNA sequences from two mitochondrial genes. Furthermore, we also evaluated the consequences of phylogenetic uncertainty for identifying sites for conservation priority. Using results from Bayesian analyses, we considered the effect of uncertainty in the phylogenetic tree on phylogenetic diversity (PD) estimation. When phylogenetic uncertainty was taken into account, the conservation value of individual sites became uncertain and several potential comparisons between sites could not be supported. Consequently prioritization of one site over the other could not be defended in biodiversity conservation projects. Our study highlights the fact that accounting for phylogenetic uncertainty can alter the relative conservation priority of sites, as assessed by their phylogenetic diversity. Therefore, variability in the phylogenetic estimates should be consistently considered and integrated into estimates of phylogenetic diversity and conservation decisions to avoid making suboptimal choices.  相似文献   

12.
Lipocalins constitute a superfamily of extracellular proteins that are found in all three kingdoms of life. Although very divergent in their sequences and functions, they show remarkable similarity in 3-D structures. Lipocalins bind and transport small hydrophobic molecules. Earlier sequence-based phylogenetic studies of lipocalins highlighted that they have a long evolutionary history. However the molecular and structural basis of their functional diversity is not completely understood. The main objective of the present study is to understand functional diversity of the lipocalins using a structure-based phylogenetic approach. The present study with 39 protein domains from the lipocalin superfamily suggests that the clusters of lipocalins obtained by structure-based phylogeny correspond well with the functional diversity. The detailed analysis on each of the clusters and sub-clusters reveals that the 39 lipocalin domains cluster based on their mode of ligand binding though the clustering was performed on the basis of gross domain structure. The outliers in the phylogenetic tree are often from single member families. Also structure-based phylogenetic approach has provided pointers to assign putative function for the domains of unknown function in lipocalin family. The approach employed in the present study can be used in the future for the functional identification of new lipocalin proteins and may be extended to other protein families where members show poor sequence similarity but high structural similarity.  相似文献   

13.
Summary Available sequences that correspond to the E. coli ribosomal proteins L11, L1, L10, and L12 from eubacteria, archaebacteria, and eukaryotes have been aligned. The alignments were analyzed qualitatively for shared structural features and for conservation of deletions or insertions. The alignments were further subjected to quantitative phylogenetic analysis, and the amino acid identity between selected pairs of sequences was calculated. In general, eubacteria, archaebacteria, and eukaryotes each form coherent and well-resolved nonoverlapping phylogenetic domains. The degree of diversity of the four proteins between the three groups is not uniform. For L11, the eubacterial and archaebacterial proteins are very similar whereas the eukaryotic L11 is clearly less similar. In contrast, in the case of the L12 proteins and to a lesser extent the L10 proteins, the archaebacterial and eukaryotic proteins are similar whereas the eubacterial proteins are different. The eukaryotic L1 equivalent protein has yet to be identified. If the root of the universal tree is near or within the eubacterial domain, our ribosomal protein-based phylogenies indicate that archaebacteria are monophyletic. The eukaryotic lineage appears to originate either near or within the archaebacterial domain. Correspondence to: P. Dennis  相似文献   

14.
Many prokaryotes have multiple ribosomal RNA operons. Generally, sequence differences between small subunit (SSU) rRNA genes are minor (<1%) and cause little concern for phylogenetic inference or environmental diversity studies. For Halobacteriales, an order of extremely halophilic, aerobic Archaea, within-genome SSU rRNA sequence divergence can exceed 5%, rendering phylogenetic assignment problematic. The RNA polymerase B' subunit gene (rpoB') is a single-copy conserved gene that may be an appropriate alternative phylogenetic marker for Halobacteriales. We sequenced a fragment of the rpoB' gene from 21 species, encompassing 15 genera of Halobacteriales. To examine the utility of rpoB' as a phylogenetic marker in Halobacteriales, we investigated three properties of rpoB' trees: the variation in resolution between trees inferred from the rpoB' DNA and RpoB' protein alignment, the degree of mutational saturation between taxa, and congruence with the SSU rRNA tree. The rpoB' DNA and protein trees were for the most part congruent and consistently recovered two well-supported monophyletic groups, the clade I and clade II haloarchaea, within a collection of less well resolved Halobacteriales lineages. A comparison of observed versus inferred numbers of substitution revealed mutational saturation in the rpoB' DNA data set, particularly between more distant species. Thus, the RpoB' protein sequence may be more reliable than the rpoB' DNA sequence for inferring Halobacteriales phylogeny. AU tests of tree selection indicated the trees inferred from rpoB' DNA and protein alignments were significantly incongruent with the SSU rRNA tree. We discuss possible explanations for this incongruence, including tree reconstruction artifact, differential paralog sampling, and lateral gene transfer. This is the first study of Halobacteriales evolution based on a marker other than the SSU rRNA gene. In addition, we present a valuable phylogenetic framework encompassing a broad diversity of Halobacteriales, in which novel sequences can be inserted for evolutionary, ecological, or taxonomic investigations.  相似文献   

15.
Although phylogenetic diversity has been suggested to be relevant from a conservation point of view, its role is still limited in applied nature conservation. Recently, the practice of investing conservation resources based on threatened species was identified as a reason for the slow integration of phylogenetic diversity in nature conservation planning. One of the main arguments is based on the observation that threatened species are not evenly distributed over the phylogenetic tree. However this argument seems to dismiss the fact that conservation action is a spatially explicit process, and even if threatened species are not evenly distributed over the phylogenetic tree, the occurrence of threatened species could still indicate areas with above average phylogenetic diversity and consequently could protect phylogenetic diversity. Here we aim to study the selection of important bird areas in Central Asia, which were nominated largely based on the presence of threatened bird species. We show that although threatened species occurring in Central Asia do not capture phylogenetically more distinct species than expected by chance, the current spatially explicit conservation approach of selecting important bird areas covers above average taxonomic and phylogenetic diversity of breeding and wintering birds. We conclude that the spatially explicit processes of conservation actions need to be considered in the current discussion of whether new prioritization methods are needed to complement conservation action based on threatened species.  相似文献   

16.
Computational protein design can be used to select sequences that are compatible with a fixed-backbone template. This strategy has been used in numerous instances to engineer novel proteins. However, the fixed-backbone assumption severely restricts the sequence space that is accessible via design. For challenging problems, such as the design of functional proteins, this may not be acceptable. Here, we present a method for introducing backbone flexibility into protein design calculations and apply it to the design of diverse helical BH3 ligands that bind to the anti-apoptotic protein Bcl-xL, a member of the Bcl-2 protein family. We demonstrate how normal mode analysis can be used to sample different BH3 backbones, and show that this leads to a larger and more diverse set of low-energy solutions than can be achieved using a native high-resolution Bcl-xL complex crystal structure as a template. We tested several of the designed solutions experimentally and found that this approach worked well when normal mode calculations were used to deform a native BH3 helix structure, but less well when they were used to deform an idealized helix. A subsequent round of design and testing identified a likely source of the problem as inadequate sampling of the helix pitch. In all, we tested 17 designed BH3 peptide sequences, including several point mutants. Of these, eight bound well to Bcl-xL and four others showed weak but detectable binding. The successful designs showed a diversity of sequences that would have been difficult or impossible to achieve using only a fixed backbone. Thus, introducing backbone flexibility via normal mode analysis effectively broadened the set of sequences identified by computational design, and provided insight into positions important for binding Bcl-xL.  相似文献   

17.
The human family of ELMO domain-containing proteins (ELMODs) consists of six members and is defined by the presence of the ELMO domain. Within this family are two subclassifications of proteins, based on primary sequence conservation, protein size, and domain architecture, deemed ELMOD and ELMO. In this study, we used homology searching and phylogenetics to identify ELMOD family homologs in genomes from across eukaryotic diversity. This demonstrated not only that the protein family is ancient but also that ELMOs are potentially restricted to the supergroup Opisthokonta (Metazoa and Fungi), whereas proteins with the ELMOD organization are found in diverse eukaryotes and thus were likely the form present in the last eukaryotic common ancestor. The segregation of the ELMO clade from the larger ELMOD group is consistent with their contrasting functions as unconventional Rac1 guanine nucleotide exchange factors and the Arf family GTPase-activating proteins, respectively. We used unbiased, phylogenetic sorting and sequence alignments to identify the most highly conserved residues within the ELMO domain to identify a putative GAP domain within the ELMODs. Three independent but complementary assays were used to provide an initial characterization of this domain. We identified a highly conserved arginine residue critical for both the biochemical and cellular GAP activity of ELMODs. We also provide initial evidence of the function of human ELMOD1 as an Arf family GAP at the Golgi. These findings provide the basis for the future study of the ELMOD family of proteins and a new avenue for the study of Arf family GTPases.  相似文献   

18.
Conserved protein sequence segments are commonly believed to correspond to functional sites in the protein sequence. A novel approach is proposed to profile the changing degree of conservation along the protein sequence, by evaluating the occurrence frequencies of all short oligopeptides of the given sequence in a large proteome database. Thus, a protein sequence conservation profile can be plotted for every protein. The profile indicates where along the sequences the potential functional (conserved) sites are located. The corresponding oligopeptides belonging to the sites are very frequent across many prokaryotic species. Analysis of a representative set of such profiles reveals a common feature of all examined proteins: they consist of sequence modules represented by the peaks of conservation. Typical size of the modules (peak-to-peak distance) is 25-30 amino acid residues.  相似文献   

19.
Statistical coupling analysis (SCA) is a method for analyzing multiple sequence alignments that was used to identify groups of coevolving residues termed “sectors”. The method applies spectral analysis to a matrix obtained by combining correlation information with sequence conservation. It has been asserted that the protein sectors identified by SCA are functionally significant, with different sectors controlling different biochemical properties of the protein. Here we reconsider the available experimental data and note that it involves almost exclusively proteins with a single sector. We show that in this case sequence conservation is the dominating factor in SCA, and can alone be used to make statistically equivalent functional predictions. Therefore, we suggest shifting the experimental focus to proteins for which SCA identifies several sectors. Correlations in protein alignments, which have been shown to be informative in a number of independent studies, would then be less dominated by sequence conservation.  相似文献   

20.
Dokholyan NV 《Proteins》2004,54(4):622-628
Selecting a protein sequence that corresponds to a specific three-dimensional protein structure is known as the protein design problem. One principal bottleneck in solving this problem is our lack of knowledge of precise atomic interactions. Using a simple model of amino acid interactions, we determine three crucial factors that are important for solving the protein design problem. Among these factors is the protein alphabet-a set of sequence elements that encodes protein structure. Our model predicts that alphabet size is independent of protein length, suggesting the possibility of designing a protein of arbitrary length with the natural protein alphabet. We also find that protein alphabet size is governed by protein structural properties and the energetic properties of the protein alphabet units. We discover that the usage of average types of amino acid in proteins is less than expected if amino acids were chosen randomly with naturally occurring frequencies. We propose three possible scenarios that account for amino acid underusage in proteins. These scenarios suggest the possibility that amino acids themselves might not constitute the alphabet of natural proteins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号