首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 390 毫秒
1.
An algorithm is presented for localizing variable and constant regions in homologous protein sequences. A set of aligned protein sequences is divided into two groups consisting of m and n sequences. Each group contains sequences of most related species. Value of the position dissimilarity of proteins from different groups of m and n sequences is defined as a number of failures to coincide in comparison with all possible mXn pairs of amino acid residues in the position (each from different group) divided by mXn. The position dissimilarity value of m protein sequences within a group is defined as the number of failures to coincide in comparison with all possible mX X(m-1)/2 pairs of amino acid residues divided by mX(m-1)/2. Ten position average of dissimilarity values is plotted vs. the first position number. Area of the figure included between the profile of dissimilarity values and its mean value line characterizes the overall irregularity of amino acid substitutions along the protein sequences. If the area value is greater than the average area for 1000 random profile by more than two standard deviation units, the profile extrema containing the "surplus" of area are cut off. The cut off stretches are likely to be variable and constant regions. In case of "between groups" comparisons it is found that the overall irregularity of amino acid substitutions is very high for all considered families of proteins; phospholipases A2, aspartate aminotransferases, alpha-subunits of Na+,K(+)-ATPase, L- and M-subunits of photosynthetic bacteria photoreaction centre, human rhodopsins.  相似文献   

2.
A set of aligned homologous protein sequences is divided into two groups consisting of m and n most related sequences. The value of position variability for homologous protein sequences is defined as a number of failures to coincide in the intergroup comparison of all possible m*n pairs of amino acid residues in that position divided by m*n. The position variability value plotted versus the sequence position number with a window of 10 positions gives the intergroup local variability profile. Area S of the figure included between the local variability profile and the straight line corresponding to the mean local variability value is compared with the average area Sr for 1000 random homologous protein families. If S is greater than Sr by more than 2 standard deviation units sigma r, the local variability profile is assumed to contain peaks and hollows corresponding to significant variable and conservative regions of the sequences. The profile extrema containing the area surplus delta S = S-(Sr+ 2 sigma r) are cut off by two straight lines to locate significant regions. The difference (S-Sr) given in standard deviation units sigma r is believed to be the amino acid substitution overall irregularity along the homologous protein sequences OI = (S-Sr)/sigma r. The significant conservative and variable regions of six homologous sequence families (phospholipase A2, cytochromes b, alpha-subunits of Na,K-ATPase, L- and M-subunits of photosynthetic bacteria photoreaction centre and human rhodopsins) were identified. It was shown that for artificial homologous protein sequences derived by k-fold lengthening of natural protein sequences, the OI value rises as square root of k. To compare the degree of substitution irregularity in homologous protein sequence families of different lengths L the value of standard substitution overall irregularity for L = 250 is proposed.  相似文献   

3.
A method of identification of significant conservative and variable regions in homologous protein sequences is presented. A set of aligned homologous sequences is divided into two groups consisting of m and n most related sequences. Each pair of sequences from different group is compared using unitary similarity matrix. The superposition of pairwise comparisons scanned by a window of 10 amino acid residues gives intergroup local variability profile (VP). Area S of the figure between the VP and its mean value line is compared with averaged area S(r) of 1000 VPs of artificial homologous protein families. The difference (S-S(r)) given in standard deviation units sigma r is believed to be the amino acid substitution overall irregularity along the homologous protein sequences OI = (S-S(r))/sigma r. If OI greater than 2, the real VP extrema containing the surplus of area S-(S(r) + 2 sigma r) are cut off. The cut off stretches are likely to be significant conservative and variable regions. The significant conservative and variable regions of six homologous sequence families (phospholipases A2, cytochromes b, alpha-subunits of Na, K-ATPase, L- and M-subunits of photosynthetic bacteria photoreaction centre and human rhodopsins) were identified. It was shown that for artificial homologous protein sequences derived by k-fold lengthening of natural proteins the OI value rises as square root of k. To compare the degree of substitution irregularity in homologous protein sequence families of different length L the value of standard substitution overall irregularity for L = 250 is proposed.  相似文献   

4.
A set of aligned homologous protein sequences is divided into two groups consisting of the most related sequences m and k. The value of the position variability of homologous protein sequences is defined as a number of failures to coincide in the intergroup comparison of all possible k x m pairs of amino acid residues in that position divided by k x m. The position variability value plotted vs the sequence position number with a window of 10 positions gives the intergroup local variability profile. The area S of the figure included between the local variability profile and the straight line corresponding to the mean local variability value is compared with the average area S(r) for 1000 random homologous protein families. If S is greater than S(r) by more than 2 standard deviation units sigma r the local variability profile is assumed to contain peaks and hollows corresponding to significant variable and conservative regions of the sequences. The profile extrema containing the area surplus delta S = S-(S(r) + 2 sigma r) are cut off by two straight lines to locate significant regions. The numerical experiment on the family of homologous phospholipases A2 revealed the linear dependence of the values S(r) and sigma r upon the position variability standard deviation sigma v of the homologous sequences. Furthermore, it was shown for protein families of various length (rhodopsins, aspartate aminotransferases, cytochromes b, L- and M-subunits of photosynthetic bacteria photoreaction centre and alpha-subunits of Na, K-ATPase), that delta S = S - n(S'r + 2 sigma r), where S - the area of the local variability profile, n = L/l (L - the length of the given protein family and l - the length of the hypothetical protein domain). If l = 250 then S'r = -1.42 + 62.56 sigma v and sigma'r = -0.14 + 7.46 sigma v.  相似文献   

5.
Coordinated amino acid changes in homologous protein families   总被引:4,自引:0,他引:4  
In the tobamovirus coat protein family, amino acid residues at some spatially close positions are found to be substituted in a coordinated manner [Altschuh et al. (1987) J. Mol. Biol., 193, 693]. Therefore, these positions show an identical pattern of amino acid substitutions when amino acid sequences of these homologous proteins are aligned. Based on this principle, coordinated substitutions have been searched for in three additional protein families: serine proteases, cysteine proteases and the haemoglobins. Coordinated changes have been found in all three protein families mostly within structurally constrained regions. This method works with a varying degree of success depending on the function of the proteins, the range of sequence similarities and the number of sequences considered. By relaxing the criteria for residue selection, the method was adapted to cover a broader range of protein families and to study regions of the proteins having weaker structural constraints. The information derived by these methods provides a general guide for engineering of a large variety of proteins to analyse structure-function relationships.  相似文献   

6.
To elucidate the evolutionary mechanisms of the human immunodeficiency virus type 1 gp120 envelope glycoprotein at the single-site level, the degree of amino acid variation and the numbers of synonymous and nonsynonymous substitutions were examined in 186 nucleotide sequences for gp120 (subtype B). Analyses of amino acid variabilities showed that the level of variability was very different from site to site in both conserved (C1 to C5) and variable (V1 to V5) regions previously assigned. To examine the relative importance of positive and negative selection for each amino acid position, the numbers of synonymous and nonsynonymous substitutions that occurred at each codon position were estimated by taking phylogenetic relationships into account. Among the 414 codon positions examined, we identified 33 positions where nonsynonymous substitutions were significantly predominant. These positions where positive selection may be operating, which we call putative positive selection (PS) sites, were found not only in the variable loops but also in the conserved regions (C1 to C4). In particular, we found seven PS sites at the surface positions of the alpha-helix (positions 335 to 347 in the C3 region) in the opposite face for CD4 binding. Furthermore, two PS sites in the C2 region and four PS sites in the C4 region were detected in the same face of the protein. The PS sites found in the C2, C3, and C4 regions were separated in the amino acid sequence but close together in the three-dimensional structure. This observation suggests the existence of discontinuous epitopes in the protein's surface including this alpha-helix, although the antigenicity of this area has not been reported yet.  相似文献   

7.
8.
Summary A method for estimating the evolutionary rates of synonymous and amino acid substitutions from homologous nucleotide sequences is presented. This method is applied to genes of øX174 and G4 genomes, histone genes and-globin genes, for which homologous nucleotide sequences are available for comparison to be made. It is shown that the rates of synonymous substitutions are quite uniform among the non-overlapping genes of øX174 and G4 and among histone genes H4, H2B, H3 and H2A. A comparison between øX174 and G4 reveals that, in the overlapping segments of the A-gene, the rate of synonymous substitution is reduced more significantly than the rate of amino acid substitution relative to the corresponding rate in the nonoverlapping segment. It is also suggested that, in the coding regions surrounding the splicing points of intervening sequences of-globin genes, there exist rigid secondary structures. It is in only these regions that the-globin genes show the slowing down of evolutionary rates of both synonymous and amino acid substitutions in the primate line.  相似文献   

9.
The 140 X 10(3) base late chorion locus of Bombyx mori contains two 15-member multigene families arranged in tightly linked pairs, which are divergently transcribed (the high-cysteine A (HcA) and the high-cysteine B (HcB) families). Previous DNA hybridization experiments have indicated that all members of these gene families contain a complex pattern of shared sequence variation. The sequence analysis in this paper involving all 15 gene pairs allows a comprehensive examination of the nature of this variation. Average sequence homology between gene pairs is: 95% for the protein-encoding regions; 93% for the common 272 base-pair 5' flanking region; 87% for the introns; and 88% for the 3' untranslated regions. Considering the great degree of sequence homology in the coding regions, an unexpectedly high level of variation is found in the deduced protein sequences. Over 50% of the nucleotide substitutions in the protein-encoding regions lead to amino acid replacements, most of which involve a change in charge or effect the secondary structure of the protein. In addition, significant differences in length between the proteins occur in the carboxyl-terminal arm. In both families, the major portion of this arm is composed of Cys-Gly-Gly and Cys-Gly subrepeats forming a (Cys-Gly-Gly)2-(Cys-Gly)2 major repeat. Differences in the number of complete and partial repeats results in deduced protein sequences that contain arms varying from 32 to 54 amino acid residues for members of the HcA family and 14 to 88 residues for the HcB family. The high level of variation in protein composition indicates a lack of strong selective pressure. We suggest the high level of DNA sequence homology maintained by these genes in the coding as well as in the non-coding regions is the result of sequence exchange between family members.  相似文献   

10.
Ciliates provide a powerful system to analyze the evolution of duplicated alpha-tubulin genes in the context of single-celled organisms. Genealogical analyses of ciliate alpha-tubulin sequences reveal five apparently recent gene duplications. Comparisons of paralogs in different ciliates implicate differing patterns of substitutions (e.g., ratios of replacement/synonymous nucleotides and radical/conservative amino acids) following duplication. Most substitutions between paralogs in Euplotes crassus, Halteria grandinella and Paramecium tetraurelia are synonymous. In contrast, alpha-tubulin paralogs within Stylonychia lemnae and Chilodonella uncinata are evolving at significantly different rates and have higher ratios of both replacement substitutions to synonymous substitutions and radical amino acid changes to conservative amino acid changes. Moreover, the amino acid substitutions in C. uncinata and S. lemnae paralogs are limited to short stretches that correspond to functionally important regions of the alpha-tubulin protein. The topology of ciliate alpha-tubulin genealogies are inconsistent with taxonomy based on morphology and other molecular markers, which may be due to taxonomic sampling, gene conversion, unequal rates of evolution, or asymmetric patterns of gene duplication and loss.  相似文献   

11.
Sequence Evolution of Drosophila Mitochondrial DNA   总被引:15,自引:3,他引:15       下载免费PDF全文
We have compared nucleotide sequences of corresponding segments of the mitochondrial DNA (mtDNA) molecules of Drosophila yakuba and Drosophila melanogaster, which contain the genes for six proteins and seven tRNAs. The overall frequency of substitution between the nucleotide sequences of these protein genes is 7.2%. As was found for mtDNAs from closely related mammals, most substitutions (86%) in Drosophila mitochondrial protein genes do not result in an amino acid replacement. However, the frequencies of transitions and transversions are approximately equal in Drosophila mtDNAs, which is in contrast to the vast excess of transitions over transversions in mammalian mtDNAs. In Drosophila mtDNAs the frequency of C----T substitutions per codon in the third position is 2.5 times greater among codons of two-codon families than among codons of four-codon families; this is contrary to the hypothesis that third position silent substitutions are neutral in regard to selection. In the third position of codons of four-codon families transversions are 4.6 times more frequent than transitions and A----T substitutions account for 86% of all transversions. Ninety-four percent of all codons in the Drosophila mtDNA segments analyzed end in A or T. However, as this alone cannot account for the observed high frequency of A----T substitutions there must be either a disproportionately high rate of A----T mutation in Drosophila mtDNA or selection bias for the products of A----T mutation. --Consideration of the frequencies of interchange of AGA and AGT codons in the corresponding D. yakuba and D. melanogaster mitochondrial protein genes provides strong support for the view that AGA specifies serine in the Drosophila mitochondrial genetic code.  相似文献   

12.
Hu L  Cui W  He Z  Shi X  Feng K  Ma B  Cai YD 《PloS one》2012,7(6):e39369
Amyloid fibrillar aggregates of polypeptides are associated with many neurodegenerative diseases. Short peptide segments in protein sequences may trigger aggregation. Identifying these stretches and examining their behavior in longer protein segments is critical for understanding these diseases and obtaining potential therapies. In this study, we combined machine learning and structure-based energy evaluation to examine and predict amyloidogenic segments. Our feature selection method discovered that windows consisting of long amino acid segments of ~30 residues, instead of the commonly used short hexapeptides, provided the highest accuracy. Weighted contributions of an amino acid at each position in a 27 residue window revealed three cooperative regions of short stretch, resemble the β-strand-turn-β-strand motif in A-βpeptide amyloid and β-solenoid structure of HET-s(218-289) prion (C). Using an in-house energy evaluation algorithm, the interaction energy between two short stretches in long segment is computed and incorporated as an additional feature. The algorithm successfully predicted and classified amyloid segments with an overall accuracy of 75%. Our study revealed that genome-wide amyloid segments are not only dependent on short high propensity stretches, but also on nearby residues.  相似文献   

13.
Evolution of the Borrelia burgdorferi outer surface protein OspC.   总被引:1,自引:0,他引:1       下载免费PDF全文
The genes coding for outer surface protein OspC from 22 Borrelia burgdorferi strains isolated from patients with Lyme borreliosis were cloned and sequenced. For reference purposes, the 16S rRNA genes from 17 of these strains were sequenced after being cloned. The deduced OspC amino acid sequences were aligned with 12 published OspC sequences and revealed the presence of 48 conserved amino acids. On the basis of the alignment, OspC could be divided into an amino-terminal relatively conserved region and a relatively variable region in the central portion. The distance tree obtained divided the ospC sequences into three groups. The first group contained ospC alleles from all (n = 13) sensu stricto strains, the second group contained ospC alleles from seven Borrelia afzelii strains, and the third group contained ospC alleles from five B. afzelii and all (n = 9) Borrelia garinii strains. The ratio of the mean number of synonymous (dS) and nonsynonymous (dN) nucleotide substitutions per site calculated for B. burgdorferi sensu stricto, B. garinii, and B. afzelii ospC alleles suggested that the polymorphism of OspC is due to positive selection favoring diversity at the amino acid level in the relatively variable region. On the basis of the comparison of 16S rRNA gene sequences, Borrelia hermsii is more closely related to B. afzelii than to B. burgdorferi sensu stricto and B. garinii. In contrast, the phylogenetic tree obtained for the B. hermsii variable major protein, Vmp33, and 18 OspC amino acid sequences suggested that Vmp33 and OspC from B. burgdorferi sensu stricto strains share a common evolutionary origin.  相似文献   

14.
As part of a study of protein folding and stability, the three-dimensional structures of yeast iso-2-cytochrome c and a composite protein (B-2036) composed of primary sequences of both iso-1 and iso-2-cytochromes c have been solved to 1.9 A and 1.95 A resolutions, respectively, using X-ray diffraction techniques. The sequences of iso-1 and iso-2-cytochrome c share approximately 84% identity and the B-2036 composite protein has residues 15 to 63 from iso-2-cytochrome c with the rest being derived form the iso-1 protein. Comparison of these structures reveals that amino acid substitutions result in alterations in the details of intramolecular interactions. Specifically, the substitution Leu98Met results in the filling of an internal cavity present in iso-1-cytochrome c. Further substitutions of Val20Ile and Cys102Ala alter the packing of secondary structure elements in the iso-2 protein. Blending the isozymic amino acid sequences in this latter area results in the expansion of the volume of an internal cavity in the B-2036 structure to relieve a steric clash between Ile20 and Cys102. Modification of hydrogen bonding and protein packing without disrupting the protein fold is illustrated by the His26Asn and Asn63Ser substitutions between iso-1 and iso-2-cytochromes c. Alternatively, a change in main-chain fold is observed at Gly37 apparently due to a remote amino acid substitution. Further structural changes occur at Phe82 and the amino terminus where a four residue extension is present in yeast iso-2-cytochrome c. An additional comparison with all other eukaryotic cytochrome c structures determined to date is presented, along with an analysis of conserved water molecules. Also determined are the midpoint reduction potentials of iso-2 and B-2036 cytochromes c using direct electrochemistry. The values obtained are 286 and 288 mV, respectively, indicating that the amino acid substitutions present have had only a small impact on the heme reduction potential in comparison to iso-1-cytochrome c, which has a reduction potential of 290 mV.  相似文献   

15.
The amino acid sequence of positions 1--150 of a light chain, isolated from another monoclonal rabbit anti-streptococcal group A-variant polysaccharide antibody, was determined. The analysis was performed with 2 mumol of polypeptide chain, using a grossly modified Beckman 890B sequenator. This sequence stretch accounts for the whole variable region and a considerable part of the constant region at a total length of 218 amino acids. This allotype b4 light chain was isolated from a non-precipitating, end-group-specific antibody with a KD = 1.3 X 10(-5)M. This brings the present number of totally known rabbit VL sequences of antigen elicited antibodies to 21. A comparison of these 21 sequences reveals a building plan of ribbit VL homologous to that of human and murine VL regions. The observed variability does follow a pattern of linked amino acid substitutions, indicating that this information must be contained in the germ-line of the rabbit in the form of multiple VL region genes. This conclusion, however, does not rule out the occasional variant being due to somatic rearrangement. Finally, this comparison reveals that the joining peptide between positions 96--110 is also a separate entity in rabbit VL region sequences.  相似文献   

16.
The presented software package allows: 1) to input and edit amino acid sequences; 2) to list aligned amino acid sequences of homologous proteins; 3) pairwise comparison of homologous sequences; 4) construction of phylogenetic trees; 5) comparison of two groups of protein sequences from the same family of homologous proteins; 6) graphic identification of conservative and variable regions of homologous sequences. The stepwise application of the programs allows to study the process of amino acid replacement accumulation during certain intervals of species evolution.  相似文献   

17.
The authors established the amino acid substitutions determining G3m(s) and G3m(t) specificities, which characterize Mongoloid populations, by sequence analysis of the Fc region of a myeloma protein (Jir). By comparing the amino acid sequences of the IgG3 (Jir) and the other IgG subclasses analyzed to date, it was found that G3m(s) was an isoallotype specified by an amino acid substitution at position 435; i.e., whereas the subclasses IgG1, IgG2, and IgG4 had histidine in common, G3m(s-) had arginine in this position. This was also confirmed by the observation that the Fc fragment in question bound to protein A. It was also established that the amino acid at position 379 of G3m(t-) IgG3 and the other subclasses was valine, whereas methionine in this position was specific for G3m(t+). In addition, the amino acids at position 339 of G3m(u-) IgG3 Jir was threonine, and at position 296 of G3m(g-) IgG3 Jir was tyrosine. These findings are not in accord with the hitherto postulated relations of alanine and phenylalanine to G3m(u-) and G3m(g-), respectively. Finally, this study showed that a large number of substitutions occurred at positions 384 through 389, which suggests that many specificities of the G3m(b) group occur on IgG3 proteins.  相似文献   

18.
Mesophilic cytochrome c(551) of Pseudomonas aeruginosa (PA c(551)) became as stable as its thermophilic counterpart, Hydrogenobacter thermophilus cytochrome c(552) (HT c(552)), through only five amino acid substitutions. The five residues, distributed in three spatially separated regions, were selected and mutated with reference to the corresponding residues in HT c(552) through careful structure comparison. Thermodynamic analysis indicated that the stability of the quintuple mutant of PA c(551) could be partly attained through an enthalpic factor. The solution structure of the mutant showed that, as in HT c(552), there were tighter side chain packings in the mutated regions. Furthermore, the mutant had an increased total accessible surface area, resulting in great negative hydration free energy. Our results provide a novel example of protein stabilization in that limited amino acid substitutions can confer the overall stability of a natural highly thermophilic protein upon a mesophilic molecule.  相似文献   

19.
Homologous amino acid sequences of phospholipases A2 of snakes belonging to families Elapidae, Viperidae and Colubridae were considered in order to study the location of conservative and variable regions. To identify significant conservative and variable regions a comparison between two groups of aligned sequences of snake phospholipases A2 was successfully applied. The phospholipases A2 sequences were divided into two groups (taxons) according to the phylogenetic tree reconstructed from the pair distance matrix. Results of the comparison were plotted to facilitate the identification of significant conservative and variable regions. It was shown, that the results of the comparison between two phylogenetic groups of snake phospholipases A2 didn't depend much on the number of each group representatives, and the location of conservative and variable regions didn't significantly change if one of the groups was represented by the single sequence. It should be mentioned, that the more the phylogenetic difference between groups of phospholipases A2 the more was the number of significant conservative and variable regions. The knowledge of the number and location of conservative and variable regions and their dependence on phylogenetic relations between the compared taxons can be used to predict the synthetic peptide structure to obtain antibodies of various specificity. These antibodies may have either a wide range of cross-reactivity against all of phospholipases A2 or a limited range of cross-reactivity against phospholipases A2 of one taxon.  相似文献   

20.
Family profile analysis (FPA), described in this paper, compares all available homologous amino acid sequences of a target family with the profile of a probe family while conventional sequence profile analysis (Gribskov M, Lüthy R, Eisenberg D. Meth Enzymol 1990;183:146-159) considers only a single target sequence in comparison with the probe family. The increased input of sequence information in FPA expands the range for sequence-based recognition of structural relationships. In the FPA algorithm, Zscores of each of the target sequences, obtained from a probe profile search over all known amino acid sequences, are averaged and then compared with the scores for sequences of 100 reference families in the same probe family search. The resulting F-Zscore of the target family, expressed in "effective standard deviations" of the mean Zscores of the reference families, with value above a threshold of 3.5 indicates a statistically significant evolutionary relationship between the target and probe families. The sensitivity of FPA to sequence information was tested with several protein families where distant relationships have been verified from known tertiary protein architectures, which included vitamin B6-dependent enzymes, (beta/alpha)8-barrel proteins, beta-trefoil proteins, and globins. In comparison to other methods, FPA proved to be significantly more sensitive, finding numerous new homologies. The FPA technique is not only useful to test a suspected relationship between probe and target families but also identifies possible target families in profile searches over all known primary structures.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号