首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Summary A method of estimating the number of nucleotide substitutions from amino acid sequence data is developed by using Dayhoff's mutation probability matrix. This method takes into account the effect of nonrandom amino acid substitutions and gives an estimate which is similar to the value obtained by Fitch's counting method, but larger than the estimate obtained under the assumption of random substitutions (Jukes and Cantor's formula). Computer simulations based on Dayhoff's mutation probability matrix have suggested that Jukes and Holmquist's method of estimating the number of nucleotide substitutions gives an overestimate when amino acid substitution is not random and the variance of the estimate is generally very large. It is also shown that when the number of nucleotide substitutions is small, this method tends to give an overestimate even when amino acid substitution is purely at random.  相似文献   

2.
Given a text of length n, a pattern of length m and an integer k, we present an algorithm for finding all occurrences of the pattern in the text, each with at most k substitutions. The algorithm runs in O(k(m log m + n)) time, and requires O(nk) space. This algorithm has direct implications for nucleotide and amino acid sequence comparisons.  相似文献   

3.
By means of reverse-phase HPLC, 2 different proteins were obtained from apparently purified pig eosinophil major basic protein (MBP) and these proteins were named GMPB1 and GMBP2. It was revealed that these 2 components of MBP have similar molecular weights and pI values, although the amino acid compositions were slightly different. In the previous study, we cloned and sequenced GMPB1 cDNA. Here we obtained another clone by plaque hybridization using a screening probe synthesized by means of polymerase chain reaction. After sequencing, it became apparent that this clone corresponded to GMBP2. As in the case of GMBP1, the cDNA of GMBP2 encoded pre-proGMBP2 with 3 domains; signal peptide, acidic pro-portion, and mature GMBP2. By comparing the sequences of GMBP1 and GMBP2, it was revealed that the proteins were quite similar to each other. In addition, their sequences also resembled those of human MBP, especially in the basic domain of mature protein; but no such similarity existed in the pro-portion. Although the molecular weights determined by SDS-PAGE of guinea pig and human MBPs were 11,000 and 9,300, respectively, the calculated molecular weights of these 3 MBPs were all 13.8 kDa. The calculated pI values of GMBP1, GMBP2 and human MBP were 11.7, 11.3 and 11.6, respectively. By means of Harr plot analysis, it was revealed that the amino acid sequences, not only in signal peptides but also in the basic domains of mature proteins, were well conserved between guinea pig and human MBPs.  相似文献   

4.
The nucleotide sequence of the Escherichia coli colicin I receptor gene (cir) has been determined. The predicted mature protein consists of 599 amino acids and has a molecular weight of 67,169. Several previously noted characteristics of other E. coli outer membrane protein sequences were also identified in the sequence of Cir. These include an overall acidic nature, the absence of long hydrophobic stretches of amino acids, and a lack of predicted alpha-helical secondary structure. Because two classes of outer membrane proteins (the TonB-dependent transport proteins and the porins) share some structural features, protein sequences from both of these groups were aligned pairwise and scored for sequence similarity. Statistical evidence suggested that the porins were not related to the proteins in the TonB-dependent group; however, there was a significant relationship between the proteins in the TonB-dependent group. On the basis of the multiple progressive sequence alignment and the similarity scores derived from it, a tree representing evolutionary distance between five TonB-dependent outer membrane transport proteins was generated.  相似文献   

5.
Summary It is known that globin genes contain three exons with the middle exon coding for a four-helical supersecondary structure responsible for heme binding. Since this portion of the globin peptide chain can be structurally superimposed onto the cytochromec and cytochromeb 5 chains (Argos and Rossmann 1979), it can be inferred that the cytochromec gene will contain only one coding sequence while the cytochromeb 5 gene will be composed of three exons as found in the globin gene.  相似文献   

6.
When two strings of symbols are aligned it is important to know whether the observed number of matches is better than that expected between two independent sequences with the same frequency of symbols. When strings are of different lengths, nulls need to be inserted in order to align the sequences. One approach is to use simple approximations of sampling for replacement. We describe an algorithm for exactly determining the frequencies of given numbers of matches, sampling without replacement. This does not lead to a simple closed form expression. However we show examples where sampling with, or without, replacement give very similar results and the simple approach may be adequate for all but the smallest cases.  相似文献   

7.
A set of aligned homologous protein sequences is divided into two groups consisting of m and n most related sequences. The value of position variability for homologous protein sequences is defined as a number of failures to coincide in the intergroup comparison of all possible m*n pairs of amino acid residues in that position divided by m*n. The position variability value plotted versus the sequence position number with a window of 10 positions gives the intergroup local variability profile. Area S of the figure included between the local variability profile and the straight line corresponding to the mean local variability value is compared with the average area Sr for 1000 random homologous protein families. If S is greater than Sr by more than 2 standard deviation units sigma r, the local variability profile is assumed to contain peaks and hollows corresponding to significant variable and conservative regions of the sequences. The profile extrema containing the area surplus delta S = S-(Sr+ 2 sigma r) are cut off by two straight lines to locate significant regions. The difference (S-Sr) given in standard deviation units sigma r is believed to be the amino acid substitution overall irregularity along the homologous protein sequences OI = (S-Sr)/sigma r. The significant conservative and variable regions of six homologous sequence families (phospholipase A2, cytochromes b, alpha-subunits of Na,K-ATPase, L- and M-subunits of photosynthetic bacteria photoreaction centre and human rhodopsins) were identified. It was shown that for artificial homologous protein sequences derived by k-fold lengthening of natural protein sequences, the OI value rises as square root of k. To compare the degree of substitution irregularity in homologous protein sequence families of different lengths L the value of standard substitution overall irregularity for L = 250 is proposed.  相似文献   

8.
Replacement substitutions of mitochondrial cytochrome c and α- and β-chains of haemoglobin have been studied by considering the structural similarity among amino acid residues at the secondary and tertiary structural levels. Secondary structural similarity explains ~70% while tertiary structural similarity explains ~50% of observed replacements for most of the cases. These structural similarities could not account for all the replacement substitutions. The study was extended to consider the composition of codons, and the chemical nature and polarity of the replacing and replaced residues. These also could not individually account for all the affected replacements. In general, no property of amino acid residues is conserved for substitutions occurring at any single position during evolution of proteins.  相似文献   

9.
The homologous genomic region that contains two paralogous genes,Adh andAdh-dup, was compared in severalDrosophila species. Sequences were analyzed as follows: a) At the nucleotide level, Ka and Ks values were determined for each pair of species. Ka-Adh and Ka-Adh-dup are not significantly different. However, Ks-Adh values are significantly lower than Ks-Adh-dup, which are more variable. In agreement with other reports, lower Ks values forAdh correlate with a high level of gene expression and relatively high percentage of G+C content in the third codon position, while the opposite applies toAdh-dup. b) At the protein level, amino acid comparisons reveal conserved regions shared by ADH and ADH-DUP, which have been assigned to known functional domains. Key residues for dehydrogenasic function are also found in ADH-DUP, thus pointing to a dehydrogenase activity for ADH-DUP, albeit very different from that of ADH.  相似文献   

10.
Much attention is being paid to protein databases as an important information source for proteome research. Although used extensively for similarity searches, protein databases themselves have not fully been characterized. In a systematic attempt to reveal protein-database characters that could contribute to revealing how protein chains are constructed, frequency distributions of all possible combinatorial sets of three, four, and five amino acids ("triplets," "quartets," and "pentats"; collectively called constituent sequences) have been examined in the nonredundant (nr) protein database, demonstrating the existence of nonrandom bias in their "availability" at the population level. Nonexistent short sequences of pentats were found that showed low availability in biological proteins against their expected probabilities of occurrence. Among them, six representative ones were successfully synthesized as peptides with reasonably high yields in a conventional Fmoc method, excluding the possibility that a putative physicochemical energy barrier in forming them could be a direct cause for the low availability. They were also expressed as soluble fusion proteins in a conventional Escherichia coli BL21Star(DE3) system with reasonably high yield, again excluding a possible difficulty in their biological synthesis. Together, these results suggest that information on three-dimensional structures and functions of proteins exists in the context of connections of short constituent sequences, and that proteins are composed of evolutionarily selected constituent sequences, which are reflected in their availability differences in the database. These results may have biological implications for protein structural studies.  相似文献   

11.
A method for optimally locating gaps in the amino acid sequences of homologous proteins is presented. The method involves three steps: (1) demonstration that the sequences are indeed homologous, (2) location of regions where the homologous pairing is reasonably certain, and (3) location of gaps between these regions so as to minimize the total number of mutations required to account for the differences between the two sequences. The major virtues of this procedure are that the assertion of homology does not depend upon the prior introduction of gaps and that a genetic rather than a chemical test is the basis for asserting a genetic relationship.This project received support from grants from NSF (GB-7486) and NIH (NB 04545-06).  相似文献   

12.
A cDNA clone for porcine liver proline-beta-naphthylamidase was isolated and sequenced. The deduced amino acid sequence of 567 residues was highly homologous with those of carboxylesterases (EC 3.1.1.1) previously reported for other species. In addition, proline-beta-naphthylamidase purified from porcine liver was shown to have strong activity towards p-nitrophenylacetate, a representative substrate for carboxylesterases. These results suggest that proline-beta-naphthylamidase is identical with carboxylesterase.  相似文献   

13.
The evolution of protein folds is under strong constraints from their surrounding environment. Although folding in water‐soluble proteins is driven primarily by hydrophobic forces, the nature of the forces that determine the folding and stability of transmembrane proteins are still not fully understood. Furthermore, the chemically heterogeneous lipid bilayer has a non‐uniform effect on protein structure. In this article, we attempt to get an insight into the nature of this effect by examining the impact of various types of local structure environment on amino acid substitution, based on alignments of high‐resolution structures of polytopic helical transmembrane proteins combined with sequences of close homologs. Compared to globular proteins, burying amino acid sidechains, especially hydrophilic ones, led to a lower increase in conservation in both the lipid‐water interface region and the hydrocarbon core region. This observation is due to surface residues in HTM proteins especially in the HC region being relatively highly conserved, suggesting higher evolutionary constraints from their specific interactions with the surrounding lipid molecules. Polar and small residues, particularly Pro and Gly, show a noticeable increase in conservation as they are positioned more towards the centre of the membrane, which is consistent with their recognized key roles in structural stability. In addition, the examination of hydrogen bonds in the membrane environment identified some exposed hydrophilic residues being better conserved when not hydrogen‐bonded to other residues, supporting the importance of lipid‐protein sidechain interactions. The conclusions presented in this study highlight the distinct features of substitution matrices that take into account the membrane environment, and their potential role in improving sequence‐structure alignments of transmembrane proteins. Proteins 2010; © 2010 Wiley‐Liss, Inc.  相似文献   

14.
The proteolytic action of trypsin, chymotrypsin, submaxillary gland proteinases, Lys-C, Staphylococcus aureus st. V8, Armilarria mellea, Mixobacter AL-2 proteinase II, thermolysin and alpha-lytic proteinase is elucidated from the analysis of te data available on the amino acid sequence studies for above 70 proteins. Properties of a series of commercial enzymic preparations and the way of preferential application of proteinases for studying the amino acid sequence are discussed.  相似文献   

15.
L-Arginine is a source of nitrogen oxide and plays a great role in a number of other biochemical processes. Functions and prospects for practical application of five groups of arginine-containing amino acid sequences and synthetic polyarginine sequences are considered. The physiological characteristics of well-known arginine-containing peptides, such as RGD containing, kyotorphin, and tuftsin, are described in detail.  相似文献   

16.
L-arginine is a source of nitrogen oxide and plays a great role in a number of other biochemical processes. Functions and prospects for practical application of five groups of arginine-containing amino acid sequences and synthetic polyarginine sequences are considered. The physiological characteristics of well-known arginine-containing peptides, such as RGD peptides, kyotorphin, and tuftsin, are described in detail. The English version of the paper: Russian Journal of Bioorganic Chemistry, 2008, vol. 34, no. 2; see also http://www.maik.ru  相似文献   

17.
18.
Analogous amino acid sequences in myelin proteolipid and viral proteins   总被引:4,自引:0,他引:4  
S Y Shaw  R A Laursen  M B Lees 《FEBS letters》1986,207(2):266-270
Computer analysis of the intrinsic membrane protein, myelin proteolipid, shows strong sequence similarities between the putative extramembrane segments of the proteolipid protein and a number of viral proteins, several of which infect humans. These similarities are even more striking than those reported previously between viral proteins and the encephalitogenic myelin basic protein (MBP). These findings, along with other reports of molecular mimicry by viruses, suggest that immunological cross-reactions between virus-induced antibodies or T-cells and analogous antigenic determinants (epitopes) in myelin proteolipid could be involved in the pathophysiology of multiple sclerosis or post-infectious demyelinating syndromes.  相似文献   

19.

Background  

The current versions of the COG and arCOG databases, both excellent frameworks for studies in comparative and functional genomics, do not contain the nucleotide sequences corresponding to their protein or protein domain entries.  相似文献   

20.
Summary A discriminant analysis on the basis of the physicochemical properties of amino acid residues is developed to investigate the accumulation pattern of amino acid substitutions in a family of proteins. The application of this analysis to vertebrate hemoglobins reveals the following new results. (1) The major components of teleost fish and amphibian hemoglobins showing the Root effect are sharply discriminated from mammalian hemoglobins in several regions of the and chains, whereas shark, minor components of teleost fish and amphibian, reptile, and bird hemoglobins showing no Root effect exhibit a gradual change to mammalian hemoglobin in a straightforward way. This result suggests at least two lines of molecular evolution in vertebrate hemoglobins. (2) The nonadult hemoglobin chains are allocated to the latter line, i.e., tadpole, , and chains are similar to shark and trout I chains, and and chains are similar to some of the reptile chains. (3) In any case, most of the amino acid residues causing the discrimination are located near the sites that carry the amino acid residues conserved well throughout all classes of vertebrates, suggesting that modifications adapting to the respective living conditions or respiratory organs have taken place effectively near the amino acid residues essential for the manifestation of cooperative oxygen binding. (4) The amino acid residues at other sites are changed from one to another species even within the same class, showing a constant substitution rate as a whole. These amino acid substitutions may be nearly neutral, being under a weak functional constraint. The number of sites allowing such neutral substitutions is rather small, less than one-half of all the sites in the adult hemoglobins of bird and mammal, whereas it amounts to two-thirds in teleost fish hemoglobins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号