首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The database PALI (Phylogeny and ALIgnment of homologous protein structures) consists of families of protein domains of known three-dimensional (3D) structure. In a PALI family, every member has been structurally aligned with every other member (pairwise) and also simultaneous superposition (multiple) of all the members has been performed. The database also contains 3D structure-based and structure-dependent sequence similarity-based phylogenetic dendrograms for all the families. The PALI release used in the present analysis comprises 225 families derived largely from the HOMSTRAD and SCOP databases. The quality of the multiple rigid-body structural alignments in PALI was compared with that obtained from COMPARER, which encodes a procedure based on properties and relationships. The alignments from the two procedures agreed very well and variations are seen only in the low sequence similarity cases often in the loop regions. A validation of Direct Pairwise Alignment (DPA) between two proteins is provided by comparing it with Pairwise alignment extracted from Multiple Alignment of all the members in the family (PMA). In general, DPA and PMA are found to vary rarely. The ready availability of pairwise alignments allows the analysis of variations in structural distances as a function of sequence similarities and number of topologically equivalent Calpha atoms. The structural distance metric used in the analysis combines root mean square deviation (r.m.s.d.) and number of equivalences, and is shown to vary similarly to r.m.s.d. The correlation between sequence similarity and structural similarity is poor in pairs with low sequence similarities. A comparison of sequence and 3D structure-based phylogenies for all the families suggests that only a few families have a radical difference in the two kinds of dendrograms. The difference could occur when the sequence similarity among the homologues is low or when the structures are subjected to evolutionary pressure for the retention of function. The PALI database is expected to be useful in furthering our understanding of the relationship between sequences and structures of homologous proteins and their evolution.  相似文献   

2.
We consider the problem of identifying common three-dimensional substructures between proteins. Our method is based on comparing the shape of the alpha-carbon backbone structures of the proteins in order to find three-dimensional (3D) rigid motions that bring portions of the geometric structures into correspondence. We propose a geometric representation of protein backbone chains that is compact yet allows for similarity measures that are robust against noise and outliers. This representation encodes the structure of the backbone as a sequence of unit vectors, defined by each adjacent pair of alpha-carbons. We then define a measure of the similarity of two protein structures based on the root mean squared (RMS) distance between corresponding orientation vectors of the two proteins. Our measure has several advantages over measures that are commonly used for comparing protein shapes, such as the minimum RMS distance between the 3D positions of corresponding atoms in two proteins. A key advantage is that this new measure behaves well for identifying common substructures, in contrast with position-based measures where the nonmatching portions of the structure dominate the measure. At the same time, it avoids the quadratic space and computational difficulties associated with methods based on distance matrices and contact maps. We show applications of our approach to detecting common contiguous substructures in pairs of proteins, as well as the more difficult problem of identifying common protein domains (i.e., larger substructures that are not necessarily contiguous along the protein chain).  相似文献   

3.
Finding the common substructures shared by two proteins is considered as one of the central issues in computational biology because of its usefulness in understanding the structure-function relationship and application in drug and vaccine design. In this paper, we propose a novel algorithm called FAMCS (Finding All Maximal Common Substructures) for the common substructure identification problem. Our method works initially at the protein secondary structural element (SSE) level and starts with the identification of all structurally similar SSE pairs. These SSE pairs are then merged into sets using a modified Apriori algorithm, which will test the similarity of various sets of SSE pairs incrementally until all the maximal sets of SSE pairs that deemed to be similar are found. The maximal common substructures of the two proteins will be formed from these maximal sets. A refinement algorithm is also proposed to fine tune the alignment from the SSE level to the residue level. Comparison of FAMCS with other methods on various proteins shows that FAMCS can address all four requirements and infer interesting biological discoveries.  相似文献   

4.
Irving JA  Whisstock JC  Lesk AM 《Proteins》2001,42(3):378-382
Structural genomics-the systematic solution of structures of the proteins of an organism-will increasingly often produce molecules of unknown function with no close relative of known function. Prediction of protein function from structure has thereby become a challenging problem of computational molecular biology. The strong conservation of active site conformations in homologous proteins suggests a method for identifying them. This depends on the relationship between size and goodness-of-fit of aligned substructures in homologous proteins. For all pairs of proteins studied, the root-mean-square deviation (RMSD) as a function of the number of residues aligned varies exponentially for large common substructures and linearly for small common substructures. The exponent of the dependence at large common substructures is well correlated with the RMSD of the core as originally calculated by Chothia and Lesk (EMBO J 1986;5:823-826), affording the possibility of reconciling different structural alignment procedures. In the region of small common substructures, reduced aligned subsets define active sites and can be used to suggest the locations of active sites in homologous proteins.  相似文献   

5.
Mistletoes are hemiparasitic plants growing on aerial parts of other host trees. Many of the mistletoes are reported to be medicinally important. The hemiparasitic nature of these plants makes their chemical composition dependent on the host on which it grows. They are shown to exhibit morphological dissimilarities also when growing on different hosts. Helicanthus elastica (Desr.) Danser (mango mistletoe) is one such less explored medicinal mistletoe found on almost every mango tree in India. Traditionally, the leaves of this plant are used for checking abortion and for removing stones in the kidney and urinary bladder while significant antioxidant and antimicrobial properties are also attributed to this species of mistletoe. The current study was undertaken to evaluate molecular differences in the genomic DNA of the plant while growing on five different host trees using four random markers employing random amplified polymorphic DNA (RAPD) followed by similarity matrix by Jaccard’s coefficient and distance matrix by hierarchal clustering analysis. Similarity and distance matrix data employing just 4 random markers, separately and the pooled data as well, revealed significant difference in the genomic DNA of H. elastica growing on five different hosts. Pooled data of similarity from all the 4 primers cumulatively showed similarity between 0.256 and 0.311. Distance matrix ranged from of 0.256 to 0.281 on pooling the data from all the four primers. The result employing a minimum number of primers could conclude that genomic DNA of H. elastica differs depending upon the host on which it grows, hence the host must be considered while studying or utilizing this mistletoe for medicinal purposes.  相似文献   

6.
A method of identification of significant conservative and variable regions in homologous protein sequences is presented. A set of aligned homologous sequences is divided into two groups consisting of m and n most related sequences. Each pair of sequences from different group is compared using unitary similarity matrix. The superposition of pairwise comparisons scanned by a window of 10 amino acid residues gives intergroup local variability profile (VP). Area S of the figure between the VP and its mean value line is compared with averaged area S(r) of 1000 VPs of artificial homologous protein families. The difference (S-S(r)) given in standard deviation units sigma r is believed to be the amino acid substitution overall irregularity along the homologous protein sequences OI = (S-S(r))/sigma r. If OI greater than 2, the real VP extrema containing the surplus of area S-(S(r) + 2 sigma r) are cut off. The cut off stretches are likely to be significant conservative and variable regions. The significant conservative and variable regions of six homologous sequence families (phospholipases A2, cytochromes b, alpha-subunits of Na, K-ATPase, L- and M-subunits of photosynthetic bacteria photoreaction centre and human rhodopsins) were identified. It was shown that for artificial homologous protein sequences derived by k-fold lengthening of natural proteins the OI value rises as square root of k. To compare the degree of substitution irregularity in homologous protein sequence families of different length L the value of standard substitution overall irregularity for L = 250 is proposed.  相似文献   

7.
Shape similarity is one of the most elusive and intriguing questions of nature and mathematics. Proteins provide a rich domain in which to test theories of shape similarity. Proteins can match at different scales and in different arrangements. Sometimes the detection of common local structure is sufficient to infer global alignment of two proteins; at other times it provides false information. Proteins with very low sequence identity may share large substructures, or perhaps just a central core. There are even examples of proteins with nearly identical primary sequences in which alpha-helices have become beta-sheets. Shape similarity can be formulated (i) in terms of global metrics, such as RMSD or Hausdorff distance, (ii) in terms of subgraph isomorphisms, such as the detection of shared substructures with similar relative locations, or (iii) purely topologically, in terms of structure preserving transformations. Existing protein structure detection programs are built on the first two types of similarity. The third forms the foundations of knot theory. The thesis of this paper is this: Protein similarity detection leads naturally to algorithms operating at the metric, relational, and isotopic scales. The paper introduces a definition of similarity based on atomic motions that preserve local backbone topology without incurring significant distance errors. Such motions are motivated by the physical requirements for rearranging subsequences of a protein. Similarity detection then seeks rigid body motions able to overlay pairs of substructures, each related by a substructure-preserving motion, without necessarily requiring global structure preservation. This definition is general enough to span a wide range of questions: One can ask for full rearrangement of one protein into another while preserving global topology, as in drug design; or one can ask for rearrangements of sets of smaller substructures, preserving local but not global topology, as in protein evolution. In the appendix, we exhibit an algorithm for answering the general rearrangement question. That algorithm has the complexity of robot motion planning. In the text, we consider a more common case in which one seeks protein similarity by rearrangements of relatively short peptide segments. We exhibit two algorithms, one based on writhing numbers and one based on line weavings. The algorithms have time complexities O(n (4)) and O(s (11)), respectively, where n is the maximum number of residues in the proteins being compared and s is the number of secondary structure elements. In practice, the running times were nearly interactive. We report results obtained with a dozen pairs of proteins, exhibiting a range of typical features.  相似文献   

8.
Environment, dispersal and patterns of species similarity   总被引:2,自引:0,他引:2  
Aim The aim of this paper is to evaluate the combined effects of geographical distance and environmental distance on patterns of species similarity (similarity in species composition between sites), and to identify factors affecting the rate of decay in species similarity with each type of distance. Location Israel. Methods Data on species composition of land snails and land birds were recorded in 27 sites of 1 × 1 km scattered across a rainfall gradient in Israel. Matrices of similarity in species composition between all pairs of sites were computed and analysed with respect to corresponding matrices of geographical distance and rainfall distance (defined as the difference in mean annual rainfall between sites, and used as a measure of environmental distance). Mantel tests were applied to determine the correlation between species similarity and each type of distance. Factors affecting the decay in species similarity were investigated by comparing different subsets of the data using randomization tests. Results Both rainfall distance and geographical distance had negative effects on species similarity. The effect of rainfall distance was statistically significant even after controlling for differences in geographical distance, and vice versa. The per‐unit effect of rainfall distance on species similarity decreased with increasing geographical distance, indicating that the two types of distances interacted in determining the similarity in species composition. Snails showed a higher rate of decay in species similarity with geographical distance than birds, and large snails showed a higher rate of decay than small snails, which are better passive dispersers. The per‐unit effects of both rainfall distance and geographical distance on species similarity were higher in the desert region than in the Mediterranean region. Analyses focusing on a grain size of 10 × 10 m showed a lower similarity in species composition and a lower rate of decay in species similarity with rainfall distance than analyses carried out at a grain size of 1 × 1 km. Main conclusions Patterns of similarity in species composition are influenced by the combined effects of environmental variation, the position of the area along environmental gradients, the dispersal properties of the component species, and the scale (both spatial extent and grain size) at which the patterns are examined.  相似文献   

9.
Recent algorithmic advances and continual increase in computational power have made it possible to simulate protein folding and dynamics on the level of ensembles. Furthermore, analyzing protein structure by using ensemble representation is intrinsic to certain experimental techniques, such as nuclear magnetic resonance. This creates a problem of how to compare an ensemble of molecules with a given reference structure. Recently, we used distance-based root-mean-square deviation (dRMS) to compare the native structure of a protein with its unfolded-state ensemble. We showed that for small, mostly alpha-helical proteins, the mean unfolded-state Calpha-Calpha distance matrix is significantly more nativelike than the Calpha-Calpha matrices corresponding to the individual members of the unfolded ensemble. Here, we give a mathematical derivation that shows that, for any ensemble of structures, the dRMS deviation between the ensemble-averaged distance matrix and any given reference distance matrix is always less than or equal to the average dRMS deviation of the individual members of the ensemble from the same reference matrix. This holds regardless of the nature of the reference structure or the structural ensemble in question. In other words, averaging of distance matrices can only increase their level of similarity to a given reference matrix, relative to the individual matrices comprising the ensemble. Furthermore, we show that the above inequality holds in the case of Cartesian coordinate-based root-mean-square deviation as well. We discuss this in the context of our proposal that the average structure of the unfolded ensemble of small helical proteins is close to the native structure, and demonstrate that this finding goes beyond the above mathematical fact.  相似文献   

10.
We have determined, via 1H-n.m.r., the solution conformation of the collagen-binding b-domain of the bovine seminal fluid protein PDC-109 (PDC-109/b). The structure determination is based on 341 interproton distance estimates and 42 dihedral angle estimates: a set of 24 initial structures were computed; 12 using the variable target function program DIANA, and 12 using the metric matrix program DISGEO. These structures were optimized by restrained energy minimization and dynamic simulated annealing using the CHARMM and X-PLOR programs. The average pairwise root-mean-square difference (r.m.s.d) between the optimized DIANA (DISGEO) structures is 0.71 A (0.82 A) for the backbone atoms, and 1.73 A (2.03 A) for all atoms. Both sets of structures exhibit the same global fold, secondary structure and placement of most non-polar side-chains. Two central antiparallel beta-sheets, which lie roughly perpendicular to each other, and two irregular loops support a large, partially exposed, hydrophobic surface that defines a putative binding site. A test of a hybrid relaxation matrix-based distance refinement protocol (MIDGE program) was performed using a normalized 250 millisecond NOESY spectrum. The resulting distances were input to the molecular mechanics/dynamics procedures mentioned above in order to optimize the DIANA structures. Our results indicate that relaxation matrix refinement of distances is most useful when used conservatively for identifying underestimated distance constraints. 1H-n.m.r. monitored ligand titration experiments revealed definite, albeit weak, binding interactions for phenethylamine and leucine analogs (Ka less than or equal to 25 M-1). Residues perturbed by ligand binding include Tyr7, Trp26, Tyr33, Asp34 and Trp39. These results suggest that PDC-109/b may recognize specific leucine and/or isoleucine-containing sequences within collagen.  相似文献   

11.
The study of human population structure allows the assessment of cultural and historical influences on mating probabilities, and, hence, genetic variation. A commonly used model is isolation by distance, which predicts a negative exponential relationship between genetic similarity and geographic distance. Anthropometric data collected during the 1930's for 261 adult women in 12 towns of rural western Ireland were used to test the isolation by distance model and to assess the influence of cultural factors upon the fit of the model. The effects of recent migration were tested by using two additional data subsets, one excluding known intercounty migrants and the other consisting of unmarried women, only in an attempt to control partially for local migration upon marriage. Deviations from the expected isolation by distance model were analyzed using rotational fitting and regression analysis. Estimates of the isolation by distance parameters agree closely with independent estimates from isonymy and with estimates obtained in other studies of rural European population structure. Analysis of the residuals indicates three major factors which contribute to deviations from the expected model: recent migration upon marriage, age variation among groups, and variation in population size and/or transportation opportunities. Variation in population size was tested using the gravity model of economic geography by regressing the residuals from the isolation by distance model for each pair of towns on the product of their population sizes. The best fit occurred for the unmarried sample, as expected from ethnographic evidence, since rural–urban migration was most common among unmarried women.  相似文献   

12.
The purpose of this investigation was to test whether the concept of critical power used in previous studies could be applied to the field of competitive swimming as critical swimming velocity (vcrit). The vcrit, defined as the swimming velocity over a very long period of time without exhaustion, was expressed as the slope of a straight line between swimming distance (dlim) at each speed (with six predetermined speeds) and the duration (tlim). Nine trained college swimmers underwent tests in a swimming flume to measure vcrit at those velocities until the onset of fatigue. A regression analysis of dlim on tlim calculated for each swimmer showed linear relationships (r2 greater than 0.998, P less than 0.01), and the slope coefficient signifying vcrit ranged from 1.062 to 1.262 m.s-1 with a mean of 1.166 (SD 0.052) m.s-1. Maximal oxygen consumption (VO2max), oxygen consumption (VO2) at anaerobic threshold, and the swimming also velocity at the onset of blood lactate accumulation (vOBLA) were also determined during the incremental swimming test. The vcrit showed significant positive correlations with VO2 at anaerobic threshold (r = 0.818, P less than 0.01), vOBLA (r = 0.949, P less than 0.01) and mean velocity of 400 m freestyle (r = 0.864, P less than 0.01). These data suggested that vcrit could be adopted as an index of endurance performance in competitive swimmers.  相似文献   

13.
Longitudinal alterations in anaerobic threshold (AT) and distance running performance were assessed three times within a 4-month period of intensive training, using 20 male, trained middle-distance runners (19-23 yr). A major effect of the high intensity regular intensive training together with 60- to 90-min AT level running training (2 d X wk-1) was a significant increase in the amount of O2 uptake corresponding to AT (VO2 AT; ml O2 X min-1 X kg-1) and in maximal oxygen uptake (VO2max; ml O2 X min-1 X kg-1). Both VO2 AT and VO2max showed significant correlations (r = -0.69 to -0.92 and r = -0.60 to -0.85, respectively) with the 10,000 m run time in every test. However, further analyses of the data indicate that increasing VO2 AT (r = -0.63, P less than 0.05) rather than VO2max (r = -0.15) could result in improving the 10,000 m race performance to a larger extent, and that the absolute amount of change (delta) in the 10,000 m run time is best accounted for by a combination of delta VO2 AT and delta 5,000 m run time. Our data suggest that, among runners not previously trained over long distances, training-induced alterations in AT in response to regular intensive training together with AT level running training may contribute significantly to the enhancement of AT and endurance running performance, probably together with an increase in muscle respiratory capacity.  相似文献   

14.
Quantification of statistical significance is essential for the interpretation of protein structural similarity. To address this, a random model for protein structure comparison was developed. Novelty of the model is threefold. First, a sample of random structure comparisons is restricted to molecules of the same size and shape as the superposition of interest. Second, careful selection of the sample and accurate modeling of shape allows approximation of the root mean square deviation (RMSD) distribution of random comparisons with a Nakagami probability density function. Third, through convolution, a second probability density function is obtained that describes the coordinate difference vector projections underlying the random distribution of RMSD. This last feature allows sampling random distributions of not only RMSD, but also any similarity score that depends on difference vector projections, such as GDT_TS score, TM score, and LiveBench 3D score. Probabilities estimated from the method correlate well with common measures of structural similarity, such as the Dali Z-score and the GDT_TS score. As a result, the p-value for a given superposition can be calculated using simple formulae depending on RMSD, radius of gyration, and thinnest molecular dimension. In addition to scoring structural similarity, p-values computed by this method can be applied to evaluation of homology modeling techniques, providing a statistically sound alternative to scores used in reference-independent evaluation of alignment quality.  相似文献   

15.
Protein structures can be encoded into binary sequences (Gabarro-Arpa et al., Comput Chem 2000;24:693-698) these are used to define a Hamming distance in conformational space: the distance between two different molecular conformations is the number of different bits in their sequences. Each bit in the sequence arises from a partition of conformational space in two halves. Thus, the information encoded in the binary sequences is also used to characterize the regions of conformational space visited by the system. We apply this distance and their associated geometric structures to the clustering and analysis of conformations sampled during a 4-ns molecular dynamics simulation of the HIV-1 integrase catalytic core. The cluster analysis of the simulation shows a division of the trajectory into two segments of 2.6 and 1.4 ns length, which are qualitatively different: the data points to the fact that equilibration is only reached at the end of the first segment. The Hamming distance is compared also to the r.m.s. deviation measure. The analysis of the cases studied so far shows that under the same conditions the two measures behave quite differently, and that the Hamming distance appears to be more robust than the r.m.s. deviation.  相似文献   

16.
We report an interesting case of structural similarity between 2 small, nonhomologous proteins, the third domain of ovomucoid (ovomucoid) and the C-terminal fragment of ribosomal L7/L12 protein (CTF). The region of similarity consists of a 3-stranded beta-sheet and an alpha-helix. This region is highly similar; the corresponding elements of secondary structure share a common topology, and the RMS difference for "equivalent" C alpha atoms is 1.6 A. Surprisingly, this common structure arises from completely different sequences. For the common core, the sequence identity is less than 3%, and there is neither significant sequence similarity nor similarity in the position or orientation of conserved hydrophobic residues. This superposition raises the question of how 2 entirely different sequences can produce an identical structure. Analyzing this common region in ovomucoid revealed that it is stabilized by disulfide bonds. In contrast, the corresponding structure in CTF is stabilized in the alpha-helix by a composition of residues with high helix-forming propensities. This result suggests that different sequences and different stabilizing interactions can produce an identical structure.  相似文献   

17.
The purpose of this study is to present measurement of ventilatory threshold (VeT) and maximal oxygen uptake (VO2max) in a large group of predominantly older subjects using a bicycle ergometer and an automated measuring system. One hundred and twenty-seven healthy elderly subjects (mean age: 68) and 44 young and middle-aged subjects (mean age: 39) underwent a maximal exercise test with breath-by-breath measurement of ventilation and gas exchange variables. Ventilatory threshold was determined by visual inspection of the breakpoints in the VE/VO2 and PETO2 data curves. Additional measures were made in a subset of subjects to determine the reproducibility and interobserver variability of VeT and the relationship between VeT and the venous lactate threshold (LaT). Day-to-day reproducibility of VeT was good with a mean difference in VO2 at VeT on two occasions of 40.23 +/- 125 ml/min. Interobserver variability was low (intraclass correlation coefficient of r = 0.941) and VeT was found to correlate to LaT (r = 0.79, P less than 0.05) with LaT occurring a mean 2.3 min after VeT. VeT declined significantly with age in both males and females but less rapidly than VO2max. Both VO2max and VeT were found to vary with age, sex, height, and weight in a stepwise multiple-linear regression analysis. Age-associated changes in skeletal muscle composition may be in part responsible for the less precipitous decline in VeT with age compared with VO2max.  相似文献   

18.
Fibre conduction velocity and fibre composition in human vastus lateralis   总被引:6,自引:0,他引:6  
The relationship between muscle fibre composition and fibre conduction velocity was investigated in 19 male track athletes, 12 sprinters and 7 distance runners, aged 20-24 years, using needle biopsy samples from vastus lateralis. Cross sectional areas of the fast twitch (FT) and slow twitch (ST) fibres were determined by histochemical analysis. The percentage of FT fibre areas ranged from 22.6 to 93.6%. Sprinters had a higher percentage of FT fibres than distance runners. Muscle fibre conduction velocity was measured with a surface electrode array placed along the muscle fibres, and calculated from the time delay between 2 myoelectric signals recorded during a maximal voluntary contraction. The conduction velocity ranged from 4.13 to 5.20 m.s-1. A linear correlation between conduction velocity and the relative area of FT fibres was statistically significant (r = 0.84, p less than 0.01). This correlation indicates that muscle fibre composition can be estimated from muscle fibre conduction velocity measured noninvasively with surface electrodes.  相似文献   

19.
20.
We investigated which attribute or what combination of attributes would best account for distance running performance of female runners. The subjects were 30 well-trained female distance runners, aged 19 to 23 years. Anthropometric and body composition characteristics, pulmonary function characteristics, blood properties, and cardiorespiratory function characteristics were measured at rest or during submaximal and maximal exercise. Analyses of the data showed that the relationship of oxygen uptake corresponding to lactate threshold (VO2T, ml.kg-1.min-1) with each distance running performance was substantially higher as compared with the relationship of other independent variables including maximal oxygen uptake (VO2max). Furthermore, multiple regression analysis indicated that running performances in 3,000m, 5,000m, and 10,000m are best accounted for by a combination of VO2/LT (X1), fat-free weight (X2), and/or mean corpuscular volume (X3). A multiple regression equation for predicting the 5,000m (Y, s) running performance was formulated as Y = -14.75X1-3. 03X2-5.79X3 + 2282.1. We suggest that VO2max would not stand alone as a decisive factor of distance running success in female runners, and that the distance running performance could be better accounted for by a combination of several attributes relating to lactate threshold, body composition, and/or hematological status. The linear regression of the predicted running performance on the actually measured running performance can be accepted in the range of 986-1197s.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号