首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
Liu H  Han H  Li J  Wong L 《In silico biology》2004,4(3):255-269
The translation initiation site (TIS) prediction problem is about how to correctly identify TIS in mRNA, cDNA, or other types of genomic sequences. High prediction accuracy can be helpful in a better understanding of protein coding from nucleotide sequences. This is an important step in genomic analysis to determine protein coding from nucleotide sequences. In this paper, we present an in silico method to predict translation initiation sites in vertebrate cDNA or mRNA sequences. This method consists of three sequential steps as follows. In the first step, candidate features are generated using k-gram amino acid patterns. In the second step, a small number of top-ranked features are selected by an entropy-based algorithm. In the third step, a classification model is built to recognize true TISs by applying support vector machines or ensembles of decision trees to the selected features. We have tested our method on several independent data sets, including two public ones and our own extracted sequences. The experimental results achieved are better than those reported previously using the same data sets. Our high accuracy not only demonstrates the feasibility of our method, but also indicates that there might be "amino acid" patterns around TIS in cDNA and mRNA sequences.  相似文献   

3.
Shi JY  Zhang SW  Pan Q  Zhou GP 《Amino acids》2008,35(2):321-327
In the Post Genome Age, there is an urgent need to develop the reliable and effective computational methods to predict the subcellular localization for the explosion of newly found proteins. Here, a novel method of pseudo amino acid (PseAA) composition, the so-called “amino acid composition distribution” (AACD), is introduced. First, a protein sequence is divided equally into multiple segments. Then, amino acid composition of each segment is calculated in series. After that, each protein sequence can be represented by a feature vector. Finally, the feature vectors of all sequences thus obtained are further input into the multi-class support vector machines to predict the subcellular localization. The results show that AACD is quite effective in representing protein sequences for the purpose of predicting protein subcellular localization.  相似文献   

4.
MOTIVATION: With protein sequences entering into databanks at an explosive pace, the early determination of the family or subfamily class for a newly found enzyme molecule becomes important because this is directly related to the detailed information about which specific target it acts on, as well as to its catalytic process and biological function. Unfortunately, it is both time-consuming and costly to do so by experiments alone. In a previous study, the covariant-discriminant algorithm was introduced to identify the 16 subfamily classes of oxidoreductases. Although the results were quite encouraging, the entire prediction process was based on the amino acid composition alone without including any sequence-order information. Therefore, it is worthy of further investigation. RESULTS: To incorporate the sequence-order effects into the predictor, the 'amphiphilic pseudo amino acid composition' is introduced to represent the statistical sample of a protein. The novel representation contains 20 + 2lambda discrete numbers: the first 20 numbers are the components of the conventional amino acid composition; the next 2lambda numbers are a set of correlation factors that reflect different hydrophobicity and hydrophilicity distribution patterns along a protein chain. Based on such a concept and formulation scheme, a new predictor is developed. It is shown by the self-consistency test, jackknife test and independent dataset tests that the success rates obtained by the new predictor are all significantly higher than those by the previous predictors. The significant enhancement in success rates also implies that the distribution of hydrophobicity and hydrophilicity of the amino acid residues along a protein chain plays a very important role to its structure and function.  相似文献   

5.
6.
Two missense mutations have been identified in the phenylalanine hydroxylase (PAH) genes of an Italian phenylketonuria (PKU) patient. Both mutations occurred in exon 7 of the PAH gene, resulting in the substitution of Trp for Arg at amino acid 252 (R252W) and of Leu for Pro (P281L) at amino acid 281 of the protein. Expression vectors containing either the normal human PAH cDNA or mutant cDNAs were constructed and transfected into cultured mammalian cells. Extracts from cells transfected with either mutant construct showed negligible enzyme activity and undetectable levels of immunoreactive PAH protein as compared to the normal construct. These results are compatible with the severe classical PKU phenotype observed in this patient. Population genetic studies in the Italian population revealed that both the R252W and the P281L mutations are in linkage disequilibrium with mutant restriction fragment length polymorphism (RFLP) haplotype 1, which is the most prevalent RFLP haplotype in this population. The R252W mutation is present in 10% and the P281L mutation is present in 20% of haplotype 1 mutant chromosomes. These mutations are both very rare among other European populations, suggesting a Mediterranean origin for these mutant chromosomes.  相似文献   

7.
Functional consequences of PRODH missense mutations   总被引:5,自引:0,他引:5       下载免费PDF全文
PRODH maps to 22q11 in the region deleted in the velocardiofacial syndrome/DiGeorge syndrome (VCFS/DGS) and encodes proline oxidase (POX), a mitochondrial inner-membrane enzyme that catalyzes the first step in the proline degradation pathway. At least 16 PRODH missense mutations have been identified in studies of type I hyperprolinemia (HPI) and schizophrenia, 10 of which are present at polymorphic frequencies. The functional consequences of these missense mutations have been inferred by evolutionary conservation, but none have been tested directly. Here, we report the effects of these mutations on POX activity. We find that four alleles (R185Q, L289M, A455S, and A472T) result in mild (<30%), six (Q19P, A167V, R185W, D426N, V427M, and R431H) in moderate (30%-70%), and five (P406L, L441P, R453C, T466M, and Q521E) in severe (>70%) reduction in POX activity, whereas one (Q521R) increases POX activity. The POX encoded by one severe allele (T466M) shows in vitro responsiveness to high cofactor (flavin adenine dinucleotide) concentrations. Although there is limited information on plasma proline levels in individuals of known PRODH genotype, extant data suggest that severe hyperprolinemia (>800 microM) occurs in individuals with large deletions and/or PRODH missense mutations with the most-severe effect on function (L441P and R453C), whereas modest hyperprolinemia (300-500 microM) is associated with PRODH alleles with a moderate reduction in activity. Interestingly, three of the four alleles associated with or found in schizophrenia (V427M, L441P, and R453C) resulted in severe reduction of POX activity and hyperprolinemia. These observations plus the high degree of polymorphism at the PRODH locus are consistent with the hypothesis that reduction in POX function is a risk factor for schizophrenia.  相似文献   

8.
A systemic study of single amino acid substitutions in bacteriophage T4 lysozyme permitted a test of the concept that conserved amino acid residues are more functionally important than nonconserved residues. Substitutions of amino acid residues that are conserved among five bacteriophage-encoded lysozymes were found to lead more frequently to loss of function than substitutions of nonconserved residues. Of 163 residues tested, only 74 (45%) are sensitive to at least one substitution; however, all 14 residues that are fully conserved are sensitive to substitutions.  相似文献   

9.
Previous results from this laboratory indicated that, in Escherichia coli K12, a new class of missense suppressors, which read the lysine codons AAA and AAG, may be misacylated lysine transfer RNAs. We therefore isolated and determined the nucleotide sequence of the lysine tRNA from two of the suppressor strains. In each case, we found both wild-type and mutant species of lysine tRNA, a result consistent with evidence that there are two genes for lysine tRNA in the E coli genome. The wild-type sequence was essentially identical to that reported for lysine tRNA from E. coli B. The mutant species isolated from each suppressor strain had a U for C70 nucleotide substitution, demonstrating that the AAG suppressor is a mutant lysine tRNA. The nucleotide substitution in the amino acid acceptor stem is consistent with the in vivo evidence that the suppressor corrects AAA and AAG missense mutations by inserting an amino acid other than lysine during polypeptide synthesis. This report represents the first verification of missense suppression caused by misacylation of a mutant tRNA.  相似文献   

10.

Background  

While occurring enzymatically in biological systems, O-linked glycosylation affects protein folding, localization and trafficking, protein solubility, antigenicity, biological activity, as well as cell-cell interactions on membrane proteins. Catalytic enzymes involve glycotransferases, sugar-transferring enzymes and glycosidases which trim specific monosaccharides from precursors to form intermediate structures. Due to the difficulty of experimental identification, several works have used computational methods to identify glycosylation sites.  相似文献   

11.
Quantifying the distribution of fitness effects among newly arising mutations in the human genome is key to resolving important debates in medical and evolutionary genetics. Here, we present a method for inferring this distribution using Single Nucleotide Polymorphism (SNP) data from a population with non-stationary demographic history (such as that of modern humans). Application of our method to 47,576 coding SNPs found by direct resequencing of 11,404 protein coding-genes in 35 individuals (20 European Americans and 15 African Americans) allows us to assess the relative contribution of demographic and selective effects to patterning amino acid variation in the human genome. We find evidence of an ancient population expansion in the sample with African ancestry and a relatively recent bottleneck in the sample with European ancestry. After accounting for these demographic effects, we find strong evidence for great variability in the selective effects of new amino acid replacing mutations. In both populations, the patterns of variation are consistent with a leptokurtic distribution of selection coefficients (e.g., gamma or log-normal) peaked near neutrality. Specifically, we predict 27–29% of amino acid changing (nonsynonymous) mutations are neutral or nearly neutral (|s|<0.01%), 30–42% are moderately deleterious (0.01%<|s|<1%), and nearly all the remainder are highly deleterious or lethal (|s|>1%). Our results are consistent with 10–20% of amino acid differences between humans and chimpanzees having been fixed by positive selection with the remainder of differences being neutral or nearly neutral. Our analysis also predicts that many of the alleles identified via whole-genome association mapping may be selectively neutral or (formerly) positively selected, implying that deleterious genetic variation affecting disease phenotype may be missed by this widely used approach for mapping genes underlying complex traits.  相似文献   

12.
Xiao X  Shao S  Ding Y  Huang Z  Chou KC 《Amino acids》2006,30(1):49-54
Summary. The avalanche of newly found protein sequences in the post-genomic era has motivated and challenged us to develop an automated method that can rapidly and accurately predict the localization of an uncharacterized protein in cells because the knowledge thus obtained can greatly speed up the process in finding its biological functions. However, it is very difficult to establish such a desired predictor by acquiring the key statistical information buried in a pile of extremely complicated and highly variable sequences. In this paper, based on the concept of the pseudo amino acid composition (Chou, K. C. PROTEINS: Structure, Function, and Genetics, 2001, 43: 246–255), the approach of cellular automata image is introduced to cope with this problem. Many important features, which are originally hidden in the long amino acid sequences, can be clearly displayed through their cellular automata images. One of the remarkable merits by doing so is that many image recognition tools can be straightforwardly applied to the target aimed here. High success rates were observed through the self-consistency, jackknife, and independent dataset tests, respectively.  相似文献   

13.
Li FM  Li QZ 《Amino acids》2008,34(1):119-125
Summary. The subnuclear localization of nuclear protein is very important for in-depth understanding of the construction and function of the nucleus. Based on the amino acid and pseudo amino acid composition (PseAA) as originally introduced by K. C. Chou can incorporate much more information of a protein sequence than the classical amino acid composition so as to significantly enhance the power of using a discrete model to predict various attributes of a protein, an algorithm of increment of diversity combined with the improved quadratic discriminant analysis is proposed to predict the protein subnuclear location. The overall predictive success rates and correlation coefficient are 75.4% and 0.629 for 504 single localization proteins in jackknife test, and 80.4% for an independent set of 92 multi-localization proteins, respectively. For 406 single localization nuclear proteins with ≤25% sequence identity, the results of jackknife test show that the overall accuracy of prediction is 77.1%. Authors’ address: Qian-Zhong Li, Laboratory of Theoretical Biophysics, Department of Physics, College of Sciences and Technology, Inner Mongolia University, Hohhot 010021, China  相似文献   

14.
Wu G  Yan S 《Peptides》2003,24(12):1837-1845
In this study, we analyzed the amino acid pairs affected by mutations in two spike proteins from human coronavirus strains 229E and OC43 by means of random analysis in order to gain some insight into the possible mutations in the spike protein from SARS-CoV. The results demonstrate that the randomly unpredictable amino acid pairs are more sensitive to the mutations. The larger is the difference between actual and predicted frequencies, the higher is the chance of mutation occurring. The effect induced by mutations is to reduce the difference between actual and predicted frequencies. The amino acid pairs whose actual frequencies are larger than their predicted frequencies are more likely to be targeted by mutations, whereas the amino acid pairs whose actual frequencies are smaller than their predicted frequencies are more likely to be formed after mutations. These findings are identical to our several recent studies, i.e. the mutations represent a process of degeneration inducing human diseases.  相似文献   

15.
We have examined the biosynthesis of normal and mutant forms of myeloperoxidase (MPO) in order to gain insights into the critical features of normal biogenesis of MPO. The expression of wild-type and mutant forms of MPO in a stably transfected cell line devoid of endogenous MPO as well as in established human promyelocytic cell lines has allowed understanding of several features of MPO biosynthesis. It is clear that heme insertion into apoproMPO is necessary for proper folding, egress from the endoplasmic reticulum (ER), and eventual entry into the maturation pathway. In addition, molecular chaperones calreticulin and calnexin interact with normal MPO precursors in a sequential and regulated fashion. Studies of naturally occurring mutants, specifically missense mutations underlying inherited MPO deficiency, and mutations in putatively important residues in MPO have highlighted special features of the ER quality control system in the context of MPO biosynthesis. With identification of additional genotypes of MPO deficiency and the recent solution of MPO crystal structure at 1.8 A, this approach provides a powerful technique to assess structure-function relationships in MPO that are likely applicable to other members of the family of animal peroxidases.  相似文献   

16.
Abstract

We have examined the biosynthesis of normal and mutant forms of myeloperoxidase (MPO) in order to gain insights into the critical features of normal biogenesis of MPO. The expression of wild-type and mutant forms of MPO in a stably transfected cell line devoid of endogenous MPO as well as in established human promyelocytic cell lines has allowed understanding of several features of MPO biosynthesis. It is clear that heme insertion into apoproMPO is necessary for proper folding, egress from the endoplasmic reticulum (ER), and eventual entry into the maturation pathway. In addition, molecular chaperones calreticulin and calnexin interact with normal MPO precursors in a sequential and regulated fashion. Studies of naturally occurring mutants, specifically missense mutations underlying inherited MPO deficiency, and mutations in putatively important residues in MPO have highlighted special features of the ER quality control system in the context of MPO biosynthesis. With identification of additional genotypes of MPO deficiency and the recent solution of MPO crystal structure at 1.8 Å, this approach provides a powerful technique to assess structure-function relationships in MPO that are likely applicable to other members of the family of animal peroxidases.  相似文献   

17.
Zhang TL  Ding YS 《Amino acids》2007,33(4):623-629
Compared with the conventional amino acid composition (AA), the pseudo amino acid composition (PseAA) as originally introduced by Chou can incorporate much more information of a protein sequence; this remarkably enhances the power to use a discrete model for predicting various attributes of a protein. In this study, based on the concept of Chou's PseAA, a 46-D (dimensional) PseAA was formulated to represent the sample of a protein and a new approach based on binary-tree support vector machines (BTSVMs) was proposed to predict the protein structural class. BTSVMs algorithm has the capability in solving the problem of unclassifiable data points in multi-class SVMs. The results by both the 10-fold cross-validation and jackknife tests demonstrate that the predictive performance using the new PseAA (46-D) is better than that of AA (20-D), which is widely used in many algorithms for protein structural class prediction. The results obtained by the new approach are quite encouraging, indicating that it can at least play a complimentary role to many of the existing methods and is a useful tool for predicting many other protein attributes as well.  相似文献   

18.
19.
The hemolytic lectin CEL-III forms transmembrane pores in the membranes of target cells. A study on the effect of site-directed mutation at Lys405 in domain 3 of CEL-III indicated that replacements of this residue by relatively smaller residues lead to a marked increase in hemolytic activity, suggesting that moderately destabilizing domain 3 facilitates formation of transmembrane pores through conformational changes.  相似文献   

20.
The rotavirus spike protein VP4 mediates attachment to host cells and subsequent membrane penetration. The VP8(*) domain of VP4 forms the spike tips and is proposed to recognize host-cell surface glycans. For sialidase-sensitive rotaviruses such as rhesus (RRV), this recognition involves terminal sialic acids. We show here that the RRV VP8(*)(64-224) protein competes with RRV infection of host cells, demonstrating its relevance to infection. In addition, we observe that the amino acids revealed by X-ray crystallography to be in direct contact with the bound sialic acid derivative methyl alpha-D-N-acetylneuraminide, and that are highly conserved amongst sialidase-sensitive rotaviruses, are residues that are also important in interactions with host-cell carbohydrates. Residues Arg101 and Ser190 of the RRV VP8(*) carbohydrate-binding site were mutated to assess their importance for binding to the sialic acid derivative and their competition with RRV infection of host cells. The crystallographic structure of the Arg(101)Ala mutant crystallized in the presence of the sialic acid derivative was determined at 295 K to a resolution of 1.9 A. Our multidisciplinary study using X-ray crystallography, saturation transfer difference nuclear magnetic resonance spectroscopy, isothermal titration calorimetry, and competitive virus infectivity assays to investigate RRV wild-type and mutant VP8(*) proteins has provided the first evidence that the carbohydrate-binding cavity in RRV VP8(*) is used for host-cell recognition, and this interaction is not only with the sialic acid portion but also with other parts of the glycan structure.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号