首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
A set of aligned homologous protein sequences is divided into two groups consisting of m and n sequences. Each group contains sequences from the most related organisms. Value of the position dissimilarity of proteins from different groups of m and n sequences is defined as a number of mismatches in comparison of all possible m X n pairs of amino acid residues in the position (each from different group) divided by m X n. Ten position average of dissimilarity values is plotted vs. the first position number. Area of the figure between the profile of dissimilarity values and its mean value line characterizes the overall irregularity of amino acid substitutions along the protein sequences. If the area is greater than the average area for 1000 random profiles by more than two standard deviation units, the profile extrema containing the "surplus" of area are cut off. The cut-off stretches are likely to be variable and constant regions. If necessary, each of stretches may be separately tested and statistically estimated using a standard size sample of artificial protein families. Intergroup comparison of protein sequences reveals high overall irregularity of amino acid substitutions and identifies variable and conservative regions for all considered families of proteins: phospholipases A2, aspartate aminotransferases, alpha-subunits of Na+, K(+)-ATPase, L- and M-subunits of photosynthetic bacteria photoreaction centre, human rhodopsins.  相似文献   

2.
A set of aligned homologous protein sequences is divided into two groups consisting of m and n most related sequences. The value of position variability for homologous protein sequences is defined as a number of failures to coincide in the intergroup comparison of all possible m*n pairs of amino acid residues in that position divided by m*n. The position variability value plotted versus the sequence position number with a window of 10 positions gives the intergroup local variability profile. Area S of the figure included between the local variability profile and the straight line corresponding to the mean local variability value is compared with the average area Sr for 1000 random homologous protein families. If S is greater than Sr by more than 2 standard deviation units sigma r, the local variability profile is assumed to contain peaks and hollows corresponding to significant variable and conservative regions of the sequences. The profile extrema containing the area surplus delta S = S-(Sr+ 2 sigma r) are cut off by two straight lines to locate significant regions. The difference (S-Sr) given in standard deviation units sigma r is believed to be the amino acid substitution overall irregularity along the homologous protein sequences OI = (S-Sr)/sigma r. The significant conservative and variable regions of six homologous sequence families (phospholipase A2, cytochromes b, alpha-subunits of Na,K-ATPase, L- and M-subunits of photosynthetic bacteria photoreaction centre and human rhodopsins) were identified. It was shown that for artificial homologous protein sequences derived by k-fold lengthening of natural protein sequences, the OI value rises as square root of k. To compare the degree of substitution irregularity in homologous protein sequence families of different lengths L the value of standard substitution overall irregularity for L = 250 is proposed.  相似文献   

3.
A set of aligned homologous protein sequences is divided into two groups consisting of the most related sequences m and k. The value of the position variability of homologous protein sequences is defined as a number of failures to coincide in the intergroup comparison of all possible k x m pairs of amino acid residues in that position divided by k x m. The position variability value plotted vs the sequence position number with a window of 10 positions gives the intergroup local variability profile. The area S of the figure included between the local variability profile and the straight line corresponding to the mean local variability value is compared with the average area S(r) for 1000 random homologous protein families. If S is greater than S(r) by more than 2 standard deviation units sigma r the local variability profile is assumed to contain peaks and hollows corresponding to significant variable and conservative regions of the sequences. The profile extrema containing the area surplus delta S = S-(S(r) + 2 sigma r) are cut off by two straight lines to locate significant regions. The numerical experiment on the family of homologous phospholipases A2 revealed the linear dependence of the values S(r) and sigma r upon the position variability standard deviation sigma v of the homologous sequences. Furthermore, it was shown for protein families of various length (rhodopsins, aspartate aminotransferases, cytochromes b, L- and M-subunits of photosynthetic bacteria photoreaction centre and alpha-subunits of Na, K-ATPase), that delta S = S - n(S'r + 2 sigma r), where S - the area of the local variability profile, n = L/l (L - the length of the given protein family and l - the length of the hypothetical protein domain). If l = 250 then S'r = -1.42 + 62.56 sigma v and sigma'r = -0.14 + 7.46 sigma v.  相似文献   

4.
A method of identification of significant conservative and variable regions in homologous protein sequences is presented. A set of aligned homologous sequences is divided into two groups consisting of m and n most related sequences. Each pair of sequences from different group is compared using unitary similarity matrix. The superposition of pairwise comparisons scanned by a window of 10 amino acid residues gives intergroup local variability profile (VP). Area S of the figure between the VP and its mean value line is compared with averaged area S(r) of 1000 VPs of artificial homologous protein families. The difference (S-S(r)) given in standard deviation units sigma r is believed to be the amino acid substitution overall irregularity along the homologous protein sequences OI = (S-S(r))/sigma r. If OI greater than 2, the real VP extrema containing the surplus of area S-(S(r) + 2 sigma r) are cut off. The cut off stretches are likely to be significant conservative and variable regions. The significant conservative and variable regions of six homologous sequence families (phospholipases A2, cytochromes b, alpha-subunits of Na, K-ATPase, L- and M-subunits of photosynthetic bacteria photoreaction centre and human rhodopsins) were identified. It was shown that for artificial homologous protein sequences derived by k-fold lengthening of natural proteins the OI value rises as square root of k. To compare the degree of substitution irregularity in homologous protein sequence families of different length L the value of standard substitution overall irregularity for L = 250 is proposed.  相似文献   

5.
In vitroevolution is used to study protein sequences, structures, and interactions and to obtain proteins with new properties. To analyze the specific features of this process in phage display experiments, we studied the amino acid composition of selected sequences, constructed a matrix of amino acid substitutions, and identified pairs of coadaptive substitutions. The amino acid frequency proved to be tightly associated with the number of corresponding codons; numerous correlated substitutions were found.  相似文献   

6.
The authors established the amino acid substitutions determining G3m(s) and G3m(t) specificities, which characterize Mongoloid populations, by sequence analysis of the Fc region of a myeloma protein (Jir). By comparing the amino acid sequences of the IgG3 (Jir) and the other IgG subclasses analyzed to date, it was found that G3m(s) was an isoallotype specified by an amino acid substitution at position 435; i.e., whereas the subclasses IgG1, IgG2, and IgG4 had histidine in common, G3m(s-) had arginine in this position. This was also confirmed by the observation that the Fc fragment in question bound to protein A. It was also established that the amino acid at position 379 of G3m(t-) IgG3 and the other subclasses was valine, whereas methionine in this position was specific for G3m(t+). In addition, the amino acids at position 339 of G3m(u-) IgG3 Jir was threonine, and at position 296 of G3m(g-) IgG3 Jir was tyrosine. These findings are not in accord with the hitherto postulated relations of alanine and phenylalanine to G3m(u-) and G3m(g-), respectively. Finally, this study showed that a large number of substitutions occurred at positions 384 through 389, which suggests that many specificities of the G3m(b) group occur on IgG3 proteins.  相似文献   

7.
In the analysis of protein-coding nucleotide sequences, the ratio of the number of nonsynonymous substitutions to that of synonymous substitutions (d(N)/d(S)) is used as an indicator for the direction and magnitude of natural selection operating at the amino acid sequence level. The d(S) and d(N) values are estimated based on the comparison of homologous codons, which are often identified by converting (reverse-translating) aligned amino acid sequences into codon sequences. In this method, however, homologous codons may be mis-identified when frame-shifts occurred or amino acid sequences were mis-aligned, which may lead to overestimation of the d(N)/d(S) ratio. Here the effect of reverse-translating aligned amino acid sequences on the estimation of d(N)/d(S) ratio was examined through a large-scale analysis of protein-coding nucleotide sequences from vertebrate species. Apparently, 1-9% of codon sites that were identified as homologous with reverse-translation contained non-homologous codons, where the d(N)/d(S) ratio was unduly high. By correcting the d(N)/d(S) ratio for these codon sites, it was inferred that the ratio was 5-43% overestimated with reverse-translation. These results suggest that caution should be exerted in the study of natural selection using the d(N)/d(S) ratio by reverse-translating aligned amino acid sequences.  相似文献   

8.
Suggestions for "safe" residue substitutions in site-directed mutagenesis   总被引:25,自引:0,他引:25  
The conserved topological structure observed in various molecular families such as globins or cytochromes c allows structural equivalencing of residues in every homologous structure and defines in a coherent way a global alignment in each sequence family. A search was performed for equivalent residue pairs in various topological families that were buried in protein cores or exposed at the protein surface and that had mutated but maintained similar unmutated environments. Amino acid residues with atoms in contact with the mutated residue pairs defined the environment. Matrices of preferred amino acid exchanges were then constructed and preferred or avoided amino acid substitutions deduced. Given the conserved atomic neighborhoods, such natural in vivo substitutions are subject to similar constrains as point mutations performed in site-directed mutagenesis experiments. The exchange matrices should provide guidelines for "safe" amino acid substitutions least likely to disturb the protein structure, either locally or in its overall folding pathway, and most likely to allow probing the structural and functional significance of the substituted site.  相似文献   

9.
It has long been known that amino acid substitutions in proteins of organisms living at moderate and high temperatures (mesophiles and thermophiles, respectively) are not all symmetrical; for example, more aligned sites have lysine in mesophiles and arginine in thermophiles than have the opposite pattern. This is generally taken to indicate that certain amino acids are favored over others by selection at different temperatures. Previous comparisons of protein sequences from mesophiles and thermophiles have used relatively small numbers of sequences from a diverse array of species, meaning that only the most common amino acid substitutions could be examined and any taxon-specific patterns would be obscured. Here, we compare a large number of proteins between mesophiles and thermophiles in the archaeal genus Methanococcus and the bacterial genus Bacillus. Each genus exhibits dramatically asymmetrical substitution patterns for many pairs of amino acids. There are several pairs of amino acids for which one amino acid is favored in thermophilic Bacillus and the other is favored in thermophilic Methanococcus; this appears to result from the higher G + C content of the DNA of thermophilic Bacillus, a complication not seen in Methanococcus.  相似文献   

10.
为了进一步明确副粘病毒Tianjin株的来源和种系进化地位,探讨其高致病性的机制.对Tianjin株NP、P、M及L蛋白进行了生物信息学分析.进化树显示:Tianjin株属于副粘病毒亚科呼吸道病毒属,且很可能为仙台病毒新的基因型.相似性比较表明,P蛋白变异最大.相似性仅为78.7%~91.9%;L蛋白相似性最高,为96.0%~98.0%.序列比对显示:NP蛋白氨基酸序列中存在15个独特的变异位点,P蛋白存在29个,M蛋白存在6个,L蛋白存在29个.这些独特变异位点的存在很可能是导致Tianjin株在宿主来源和致病特点等方面与已知仙台病毒株具有较大差异的原因.  相似文献   

11.
As next-generation sequencing projects generate massive genome-wide sequence variation data, bioinformatics tools are being developed to provide computational predictions on the functional effects of sequence variations and narrow down the search of casual variants for disease phenotypes. Different classes of sequence variations at the nucleotide level are involved in human diseases, including substitutions, insertions, deletions, frameshifts, and non-sense mutations. Frameshifts and non-sense mutations are likely to cause a negative effect on protein function. Existing prediction tools primarily focus on studying the deleterious effects of single amino acid substitutions through examining amino acid conservation at the position of interest among related sequences, an approach that is not directly applicable to insertions or deletions. Here, we introduce a versatile alignment-based score as a new metric to predict the damaging effects of variations not limited to single amino acid substitutions but also in-frame insertions, deletions, and multiple amino acid substitutions. This alignment-based score measures the change in sequence similarity of a query sequence to a protein sequence homolog before and after the introduction of an amino acid variation to the query sequence. Our results showed that the scoring scheme performs well in separating disease-associated variants (n = 21,662) from common polymorphisms (n = 37,022) for UniProt human protein variations, and also in separating deleterious variants (n = 15,179) from neutral variants (n = 17,891) for UniProt non-human protein variations. In our approach, the area under the receiver operating characteristic curve (AUC) for the human and non-human protein variation datasets is ∼0.85. We also observed that the alignment-based score correlates with the deleteriousness of a sequence variation. In summary, we have developed a new algorithm, PROVEAN (Protein Variation Effect Analyzer), which provides a generalized approach to predict the functional effects of protein sequence variations including single or multiple amino acid substitutions, and in-frame insertions and deletions. The PROVEAN tool is available online at http://provean.jcvi.org.  相似文献   

12.
It has long been suspected that analysis of correlated amino acid substitutions should uncover pairs or clusters of sites that are spatially proximal in mature protein structures. Accordingly, methods based on different mathematical principles such as information theory, correlation coefficients and maximum likelihood have been developed to identify co-evolving amino acids from multiple sequence alignments. Sets of pairs of sites whose behaviour is identified by these methods as correlated are often significantly enriched in pairs of spatially proximal residues. However, relatively high levels of false-positive predictions typically render such methods, in isolation, of little use in the ab initio prediction of protein structure. Misleading signal (or problems with the estimation of significance levels) can be caused by phylogenetic correlations between homologous sequences and from correlation due to factors other than spatial proximity (for example, correlation of sites which are not spatially close but which are involved in common functional properties of the protein). In recent years, several workers have suggested that information from correlated substitutions should be combined with other sources of information (secondary structure, solvent accessibility, evolutionary rates) in an attempt to reduce the proportion of false-positive predictions. We review methods for the detection of correlated amino acid substitutions, compare their relative performance in contact prediction and predict future directions in the field.  相似文献   

13.
Snakes are equipped with their venomic armory to tackle different prey and predators in adverse natural world. The venomic composition of snakes is a mix of biologically active proteins and polypeptides. Among different components snake venom cytotoxins and short neurotoxin are non-enzymatic polypeptide candidates with in the venom. These two components structurally resembled to three-finger protein superfamily specific scaffold. Different non-toxin family members of three-finger protein superfamily are involved in different biological roles. In the present study we analyzed the snake venom cytotoxins, short neurotoxins and related non-toxin proteins of different chordates in terms of amino acid sequence level diversification profile, polarity profile of amino acid sequences, conserved pattern of amino acids and phylogenetic relationship of these toxin and nontoxin protein sequences. Sequence alignment analysis demonstrates the polarity specific molecular enrichment strategy for better system adaptivity. Occurrence of amino acid substitution is high in number in toxin sequences. In non-toxin body proteins there are less amino acid substitutions. With the help of conserved residues these proteins maintain the three-finger protein scaffold. Due to system specific adaptation toxin and non-toxin proteins exhibit a varied type of amino acid residue distribution in sequence stretch. Understanding of Natural invention scheme (recruitment of venom proteins from normal body proteins) may help us to develop futuristic engineered bio-molecules with remedial properties.  相似文献   

14.
We have recently described a novel hemagglutinin (HA) conformational change inhibitor of human influenza virus, Stachyflin (Yoshimoto et al, Arch. Virol., 144, 1-14, 1999). Stachyflin-resistant variants of human influenza A/WSN/33 (H1N1) virus were isolated in vitro and the nucleotide sequences of their HA genes were determined. The relation of amino acid substitutions and Stachyflin resistance was analyzed with in vitro membrane fusion between HA-expressing cells and octadecylrhodamine (R18)-labelled chick erythrocytes (RBC). The amino acid substitutions, lysine to arginine at position 51 or lysine to glutamic acid at position 121 of the HA2 subunit of the HA protein was enough to confer a Stachyflin-resistant phenotype of HA protein. The molecular mechanism of anti-HA conformational change activity of Stachyflin is discussed.  相似文献   

15.
Models of amino acid substitution were developed and compared using maximum likelihood. Two kinds of models are considered. "Empirical" models do not explicitly consider factors that shape protein evolution, but attempt to summarize the substitution pattern from large quantities of real data. "Mechanistic" models are formulated at the codon level and separate mutational biases at the nucleotide level from selective constraints at the amino acid level. They account for features of sequence evolution, such as transition-transversion bias and base or codon frequency biases, and make use of physicochemical distances between amino acids to specify nonsynonymous substitution rates. A general approach is presented that transforms a Markov model of codon substitution into a model of amino acid replacement. Protein sequences from the entire mitochondrial genomes of 20 mammalian species were analyzed using different models. The mechanistic models were found to fit the data better than empirical models derived from large databases. Both the mutational distance between amino acids (determined by the genetic code and mutational biases such as the transition-transversion bias) and the physicochemical distance are found to have strong effects on amino acid substitution rates. A significant proportion of amino acid substitutions appeared to have involved more than one codon position, indicating that nucleotide substitutions at neighboring sites may be correlated. Rates of amino acid substitution were found to be highly variable among sites.   相似文献   

16.
Homologous proteins, which possess similar shapes, functions, and amino acid sequences, are encoded by homologous messenger ribonucleic acids whose codon sequences tend to be similar. It is proposed that helical configurons are generated when certain pairs of contigous codons are translated, and that non-helical configurons appear when other specific pairs of codons are read off. The resulting sequence of configurons comprises the polyconfiguron, which forms the native structure of the protein.  相似文献   

17.
Rapid evolution of mammalian X-linked testis-expressed homeobox genes   总被引:5,自引:0,他引:5  
Wang X  Zhang J 《Genetics》2004,167(2):879-888
  相似文献   

18.
19.
Two different states of human immunodeficiency virus type 1 are apparent in the asymptomatic and late stages of infection. Important determinants associated with these two states have been found within the V3 loop of the viral Env protein. In this study, two large data sets of published V3 sequences were analyzed to identify patterns of sequence variability that would correspond to these two states of the virus. We were especially interested in the pattern of basic amino acid substitutions, since the presence of basic amino acids in V3 has been shown to change virus tropism in cell culture. Four features of the sequence heterogeneity in V3 were observed: (i) approximately 70% of all nonconservative basic substitutions occur at four positions in V3, and V3 sequences with a basic substitution in at least one of these four positions contain approximately 95% of all nonconservative basic substitutions; (ii) substitution patterns within V3 are influenced by the identity of the amino acid at position 25; (iii) sequence polymorphisms account for a significant fraction of uncharged amino acid substitutions at several positions in V3, and sequence heterogeneity other than these polymorphisms is most significant at two positions near the tip of V3; and (iv) sequence heterogeneity in V3 (in addition to the basic amino acid substitutions) is approximately twofold greater in V3 sequences that contain basic amino acid substitutions. By using this sequence analysis, we were able to identify distinct groups of V3 sequences in infected patients that appear to correspond to these two virus states. The identification of these discrete sequence patterns in vivo demonstrates how the V3 sequence can be used as a genetic marker for studying the two states of human immunodeficiency virus type 1.  相似文献   

20.
In vitro evolution is used to study protein sequences, structures, and interactions and to obtain proteins with new properties. To analyze the specific features of this process in experiments with phage display, we studied the amino acid composition of selected sequences, constructed a matrix of amino acid substitutions, and identified pairs of coadaptive substitutions. Amino acid frequency proved to be tightly associated with the number of corresponding codons; numerous correlated substitutions were found.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号