首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Allosteric interactions between residues that are spatially apart and well separated in sequence are important in the function of multimeric proteins as well as single-domain proteins. This observation suggests that, among the residues that are involved in long-range communications, mutation at one site should affect interactions at a distant site. By adopting a sequence-based approach, we present an automated approach that uses a generalization of the familiar sequence entropy in conjunction with a coupled two-way clustering algorithm, to predict the network of interactions that trigger allosteric interactions in proteins. We use the method to identify the subset of dynamically important residues in three families, namely, the small PDZ family, G protein-coupled receptors (GPCR), and the Lectins, which are cell-adhesion receptors that mediate the tethering and rolling of leukocytes on inflamed endothelium. For the PDZ and GPCR families, our procedure predicts, in agreement with previous studies, a network containing a small number of residues that are involved in their function. Application to the Lectin family reveals a network of residues interspersed throughout the C-terminal end of the structure that are responsible for binding to ligands. Based on our results and previous studies, we propose that functional robustness requires that only a small subset of distantly connected residues be involved in transmitting allosteric signals in proteins.  相似文献   

2.
Liu X  Fan K  Wang W 《Proteins》2004,54(3):491-499
Currently, of the 10(6) known protein sequences, only about 10(4) structures have been solved. Based on homologies and similarities, proteins are grouped into different families in which each has a structural prototype, namely, the fold, and some share the same folds. However, the total number of folds and families, and furthermore, the distribution of folds over families in nature, are still an enigma. Here, we report a study on the distribution of folds over families and the total number of folds in nature, using a maximum probability principle and the moment method of estimation. A quadratic relation between the numbers of families and folds is found for the number of families in an interval from 6000 to 30,000. For example, about 2700 folds for 23,100 families are obtained, among them about 33 superfolds, including more than 100 families each, and the largest superfold comprises about 800 families. Our results suggest that although the majority of folds have only a single family per fold, a considerably larger number of folds include many more families each than in the database, and the distribution of folds over families in nature differs markedly from the sampled distribution. The long tail of fold distribution is first estimated in this article. The results fit the data for different versions of the structural classification of proteins (SCOP) excellently, and the goodness-of-fit tests strongly support the results. In addition, the method of directly "enlarging" the sample to the population may be useful in inferring distributions of species in different fields.  相似文献   

3.
We investigate the performance of combinatorial pattern discovery to detect remote sequence similarities in terms of both biological accuracy and computational efficiency for a pair of distantly related families, as a case study. The two families represent the cupredoxins and multicopper oxidases, both containing blue copper-binding domains. These families present a challenging case due to low sequence similarity, different local structure, and variable sequence conservation at their copper-binding active sites. In this study, we investigate a new approach for automatically identifying weak sequence similarities that is based on combinatorial pattern discovery. We compare its performance with a traditional, HMM-based scheme and obtain estimates for sensitivity and specificity of the two approaches. Our analysis suggests that pattern discovery methods can be substantially more sensitive in detecting remote protein relationships while at the same time guaranteeing high specificity.  相似文献   

4.
In this work we examine how protein structural changes are coupled with sequence variation in the course of evolution of a family of homologs. The sequence-structure correlation analysis performed on 81 homologous protein families shows that the majority of them exhibit statistically significant linear correlation between the measures of sequence and structural similarity. We observed, however, that there are cases where structural variability cannot be mainly explained by sequence variation, such as protein families with a number of disulfide bonds. To understand whether structures from different families and/or folds evolve in the same manner, we compared the degrees of structural change per unit of sequence change ("the evolutionary plasticity of structure") between those families with a significant linear correlation. Using rigorous statistical procedures we find that, with a few exceptions, evolutionary plasticity does not show a statistically significant difference between protein families. Similar sequence-structure analysis performed for protein loop regions shows that evolutionary plasticity of loop regions is greater than for the protein core.  相似文献   

5.
Covariation between positions in a multiple sequence alignment may reflect structural, functional, and/or phylogenetic constraints and can be analyzed by a wide variety of methods. We explored several of these methods for their ability to identify covarying positions related to the divergence of a protein family at different hierarchical levels. Specifically, we compared seven methods on a model system composed of three nested sets of G‐protein‐coupled receptors (GPCRs) in which a divergence event occurred. The covariation methods analyzed were based on: χ2 test, mutual information, substitution matrices, and perturbation methods. We first analyzed the dependence of the covariation scores on residue conservation (measured by sequence entropy), and then we analyzed the networking structure of the top pairs. Two methods out of seven—OMES (Observed minus Expected Squared) and ELSC (Explicit Likelihood of Subset Covariation)—favored pairs with intermediate entropy and a networking structure with a central residue involved in several high‐scoring pairs. This networking structure was observed for the three sequence sets. In each case, the central residue corresponded to a residue known to be crucial for the evolution of the GPCR family and the subfamily specificity. These central residues can be viewed as evolutionary hubs, in relation with an epistasis‐based mechanism of functional divergence within a protein family. Proteins 2014; 82:2141–2156. © 2014 Wiley Periodicals, Inc.  相似文献   

6.
Cellulosomes are multi-enzyme complexes that orchestrate the efficient degradation of cellulose and related plant cell wall polysaccharides. The complex is maintained by the high-affinity protein-protein interaction between two complementary modules: the cohesin and the dockerin. In order to characterize the interaction between different cohesins and dockerins, we have developed matching fusion-protein systems, which harbor either the cohesin or the dockerin component. For this purpose, corresponding plasmid cassettes were designed, which encoded for the following carrier proteins: (i) a thermostable xylanase with an appended His-tag; and (ii) a highly stable cellulose-binding module (CBM). The resultant xylanase-dockerin and CBM-cohesin fusion products exhibited high expression levels of soluble protein. The expressed, affinity-purified proteins were extremely stable, and the functionality of the cohesin or dockerin component was retained. The fusion protein system was used to establish a sensitive and reliable, semi-quantitative enzyme-linked affinity assay for determining multiple samples of cohesin-dockerin interactions in microtiter plates. A variety of cohesin-dockerin systems, which had been examined previously using other methodologies, were revisited applying the affinity-based enzyme assay, the results of which served to verify the validity of the approach.  相似文献   

7.
Structures for protein domains have increased rapidly in recent years owing to advances in structural biology and structural genomics projects. New structures are often similar to those solved previously, and such similarities can give insights into function by linking poorly understood families to those that are better characterized. They also allow the possibility of combing information to find still more proteins adopting a similar structure and sometimes a similar function, and to reprioritize families in structural genomics pipelines. We explore this possibility here by preparing merged profiles for pairs of structurally similar, but not necessarily sequence-similar, domains within the SMART and Pfam database by way of the Structural Classification of Proteins (SCOP). We show that such profiles are often able to successfully identify further members of the same superfamily and thus can be used to increase the sensitivity of database searching methods like HMMer and PSI-BLAST. We perform detailed benchmarks using the SMART and Pfam databases with four complete genomes frequently used as annotation benchmarks. We quantify the associated increase in structural information in Swissprot and discuss examples illustrating the applicability of this approach to understand functional and evolutionary relationships between protein families.  相似文献   

8.
ZFY-like genes have been observed in a variety of vertebrate species. Although originally implicated as the primary testis-determining gene in humans and other placental mammals, more recent evidence indicates a role(s) outside that of testis determination. In this study, DNA from five species of fish,Carasius auratus, Rivulus marmoratus, Xiphophorus maculatus, X. milleri, andX. nigrensis was subjected to Southern blot analysis using a PCR-amplified fragment of mouseZFY-like sequence as a probe. Restriction fragment patterns were not polymorphic between sexes in any one species but showed a different pattern for each species. With one exception,Rivulus, a 3.1-kb band from theEcoRI digestion was common to all. Sequence and open reading frame analysis of this fragment showed a strong homology to other known vertebrateZFY-like genes. Of particular interest in this gene is a novel third finger domain similar to one human and one alligatorZFY-like gene. Our studies and others provide evidence for a family of vertebrateZFY genes, with those having this novel third finger being representative of the ancestral condition.  相似文献   

9.
Protein families typically embody a range of related functions and may thus be decomposed into subfamilies with, for example, distinct substrate specificities. Detection of functionally divergent subfamilies is possible by methods for recognizing branches of adaptive evolution in a gene tree. As the number of genome sequences is growing rapidly, it is highly desirable to automatically detect subfamily function divergence. To this end, we here introduce a method for large-scale prediction of function divergence within protein families. It is called the alpha shift measure (ASM) as it is based on detecting a shift in the shape parameter (alpha [alpha]) of the substitution rate gamma distribution. Four different methods for estimating alpha were investigated. We benchmarked the accuracy of ASM using function annotation from Enzyme Commission numbers within Pfam protein families divided into subfamilies by the automatic tree-based method BETE. In a test using 563 subfamily pairs in 162 families, ASM outperformed functional site-based methods using rate or conservation shifting (rate shift measure [RSM] and conservation shift measure [CSM]). The best results were obtained using the "GZ-Gamma" method for estimating alpha. By combining ASM with RSM and CSM using linear discriminant analysis, the prediction accuracy was further improved.  相似文献   

10.
Taylor's law (TL), which states that variance in population density is related to mean density via a power law, and density‐mass allometry, which states that mean density is related to body mass via a power law, are two of the most widely observed patterns in ecology. Combining these two laws predicts that the variance in density is related to body mass via a power law (variance‐mass allometry). Marine size spectra are known to exhibit density‐mass allometry, but variance‐mass allometry has not been investigated. We show that variance and body mass in unexploited size spectrum models are related by a power law, and that this leads to TL with an exponent slightly <2. These simulated relationships are disrupted less by balanced harvesting, in which fishing effort is spread across a wide range of body sizes, than by size‐at‐entry fishing, in which only fish above a certain size may legally be caught.  相似文献   

11.
Shepherd AJ  Gorse D  Thornton JM 《Proteins》2003,50(2):290-302
A novel method is presented for the prediction of protein architecture from sequence using neural networks. The method involves the preprocessing of protein sequence data by numerically encoding it and then applying a Fourier transform. The encoded and transformed data are then used to train a neural network to recognize a number of different protein architectures. The method proved significantly better than comparable alternative strategies such as percentage dipeptide frequency, but is still limited by the size of the data set and the input demands of a neural network. Its main potential is as a complement to existing fold recognition techniques, with its ability to identify global symmetries within protein structures its greatest strength.  相似文献   

12.
Abhiman S  Sonnhammer EL 《Proteins》2005,60(4):758-768
Protein function shift can be predicted from sequence comparisons, either using positive selection signals or evolutionary rate estimation. None of the methods have been validated on large datasets, however. Here we investigate existing and novel methods for protein function shift prediction, and benchmark the accuracy against a large dataset of proteins with known enzymatic functions. Function change was predicted between subfamilies by identifying two kinds of sites in a multiple sequence alignment: Conservation-Shifting Sites (CSS), which are conserved in two subfamilies using two different amino acid types, and Rate-Shifting Sites (RSS), which have different evolutionary rates in two subfamilies. CSS were predicted by a new entropy-based method, and RSS using the Rate-Shift program. In principle, the more CSS and RSS between two subfamilies, the more likely a function shift between them. A test dataset was built by extracting subfamilies from Pfam with different EC numbers that belong to the same domain family. Subfamilies were generated automatically using a phylogenetic tree-based program, BETE. The dataset comprised 997 subfamily pairs with four or more members per subfamily. We observed a significant increase in CSS and RSS for subfamily comparisons with different EC numbers compared to cases with same EC numbers. The discrimination was better using RSS than CSS, and was more pronounced for larger families. Combining RSS and CSS by discriminant analysis improved classification accuracy to 71%. The method was applied to the Pfam database and the results are available at http://FunShift.cgb.ki.se. A closer examination of some superfamily comparisons showed that single EC numbers sometimes embody distinct functional classes. Hence, the measured accuracy of function shift is underestimated.  相似文献   

13.
The spatial distribution of organisms often differs across scales. For instance, colonial bird populations could be described, from large to small scale, as scattered clumps of otherwise regularly distributed breeding pairs. We analysed the distribution of nests of a large colonial population of white storks (Ciconia ciconia) and found a fractal pattern in each of the 4 study years. Moreover, we found that the often-observed, long-tailed frequency distribution of colony sizes was well described by a power law, regardless of the cut-off used to define colonies (from 16 to 1024 m). Thus, although storks were locally highly clumped even with tens of nests in a single tree, the population was not structured in colonies (a simple clustered distribution) as previously thought. Rather, they were distributed in a continuous hierarchical set of clusters within clusters across scales, clusters lacking the commonly assumed characteristic mean size. These quantitative solutions to previously perceived scaling problems will potentially improve our understanding of the ecology and evolution of bird coloniality and animal spacing patterns and group living in general.  相似文献   

14.
Hendriks AJ  Mulder C 《Oecologia》2008,155(4):705-716
The scaling of reproductive parameters to body size is important for understanding ecological and evolutionary patterns. Here, we derived allometric relationships for the number and mass of seeds, eggs and neonates from an existing model on population production. In a separate meta-analysis, we collected 79 empirical regressions on offspring mass and number covering different taxa and various habitats. The literature review served as a validation of the model, whereas, vice versa, consistency of isolated regressions with each other and related ecological quantities was checked with the model. The total offspring mass delivered in a reproductive event scaled to adult size with slopes in the range of about 3/4 to 1. Exponents for individual seed, egg and neonate mass varied around 1/2 for most heterotherms and between 3/4 and 1 for most homeotherms. The scaling of the progeny number released in a sowing, clutch or litter was opposite to that of their size. The linear regressions fitted into a triangular envelope where maximum offspring mass is limited by the size of the adult. Minimum seed and egg size scaled with weight exponents of approximately 0 up to 1/4. These patterns can be explained by the influence of parents on the fate of their offspring, covering the continuum of r-strategists (pelagic–aquatic, arial, most invertebrates, heterotherms) and K-strategists (littoral–terrestrial, some invertebrates, homeotherms).  相似文献   

15.
The origin and evolution of the thousands of species-specific genes with unknown functions, the so-called orphan genes, has been a mystery. Here, we have studied the rates and patterns of orphan sequence evolution, using the Rickettsia as our reference system. Of the Rickettsia conorii orphans examined in this study, 80% were found to be short gene fragments or fusions of short segments from neighboring genes. We reconstructed the putative sequences of the full-length genes from which the short orphan fragments are thought to have originated. One of the genes thus reconstructed displays weak similarity to the ankyrin-repeat protein family, an identification that is strongly supported by comparative molecular modeling. Studies of the patterns of gene fragmentation underscore the importance of short repeated sequences as targets for recombination events that result in sequence loss and the formation of short, transient open reading frames. Our analysis demonstrates that gene sequences present in the common ancestor can be inferred even in cases when no full-length open reading frame is present in any of the contemporary species. Such reconstructions support the identification of lost protein functions and hint at important lifestyle changes.  相似文献   

16.
《Current biology : CB》2022,32(13):2897-2907.e5
  1. Download : Download high-res image (124KB)
  2. Download : Download full-size image
  相似文献   

17.
Previously, it has been suggested that insect gas exchange cycle frequency (fC) is mass independent, making insects different from most other animals where periods typically scale as mass-0.25. However, the claim for insects is based on studies of only a few closely related taxa encompassing a relatively small size range. Moreover, it is not known whether the type of gas exchange pattern (discontinuous versus cyclic) influences the fC-mass scaling relationship. Here, we analyse a large database to examine interspecific fC-mass scaling. In addition, we investigate the effect of mode of gas exchange on the fC-scaling relationship using both conventional and phylogenetically independent approaches. Cycle frequency is scaled as mass(-0.280) (when accounting for phylogenetic non-independence and gas exchange pattern), which did not differ significantly from mass(-0.25). The slope of the fC-mass relationship was shallower with a significantly lower intercept for the species showing discontinuous gas exchange than for those showing the cyclic pattern, probably due to lower metabolic rates in the former. Insects therefore appear no different from other animals insofar as the scaling of gas exchange fC is concerned, although gas exchange fC may scale in distinct ways for different patterns.  相似文献   

18.
19.
Issues in predicting protein function from sequence   总被引:1,自引:0,他引:1  
Identifying homologues, defined as genes that arose from a common evolutionary ancestor, is often a relatively straightforward task, thanks to recent advances made in estimating the statistical significance of sequence similarities found from database searches. The extent by which homologues possess similarities in function, however, is less amenable to statistical analysis. Consequently, predicting function by homology is a qualitative, rather than quantitative, process and requires particular care to be taken. This review focuses on the various approaches that have been developed to predict function from the scale of the atom to that of the organism. Similarities in homologues' functions differ considerably at each of these different scales and also vary for different domain families. It is argued that due attention should be paid to all available clues to function, including orthologue identification, conservation of particular residue types, and the co-occurrence of domains in proteins. Pitfalls in database searching methods arising from amino acid compositional bias and database size effects are also discussed.  相似文献   

20.
A systematic characterization of lens crystallins from five major classes of vertebrates was carried out by exclusion gel filtration, cation-exchange chromatography and N-terminal sequence determination. All crystallin fractions except that of -crystallin were found to be N-terminally blocked. -Crystallin is present in major classes of vertebrates except the bird, showing none, or decreased amounts, of this protein in chicken and duck lenses, respectively. N-Terminal sequence analysis of the purified -crystallin polypeptides showed extensive homology between different classes of vertebrates, supporting the close relatedness of this family of crystallin even from the evolutionarily distant species. Comparison of nucleotide sequences and their predicted amino acid sequences between -crystallins of carp and rat lenses and heat-shock proteins demonstrated partial sequence homology of the encoded polypeptides and striking homology at the gene level. The unexpected strong homology of complementary DNA (cDNA) lies in the regions coding for 40 N-terminal residues of carp -II, rat 2-1, and the middle segments of 23,000- and 70,000-M r heat-shock proteins. The optimal alignment of DNA sequences along these two segments shows about 50% homology. The percentage of protein sequence identity for the corresponding aligned segments is only 20%. The weak sequence homology at the protein level is also found between the invertebrate squid crystallin and rat -crystallin polypeptides. These results pointed to the possibility of unifying three major classes of vertebrate crystallins into one // superfamily and corroborated the previous supposition that the existing crystallins in the animal kingdom are probably mutually interrelated, sharing a common ancestry.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号