首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Liu H  Han H  Li J  Wong L 《In silico biology》2004,4(3):255-269
The translation initiation site (TIS) prediction problem is about how to correctly identify TIS in mRNA, cDNA, or other types of genomic sequences. High prediction accuracy can be helpful in a better understanding of protein coding from nucleotide sequences. This is an important step in genomic analysis to determine protein coding from nucleotide sequences. In this paper, we present an in silico method to predict translation initiation sites in vertebrate cDNA or mRNA sequences. This method consists of three sequential steps as follows. In the first step, candidate features are generated using k-gram amino acid patterns. In the second step, a small number of top-ranked features are selected by an entropy-based algorithm. In the third step, a classification model is built to recognize true TISs by applying support vector machines or ensembles of decision trees to the selected features. We have tested our method on several independent data sets, including two public ones and our own extracted sequences. The experimental results achieved are better than those reported previously using the same data sets. Our high accuracy not only demonstrates the feasibility of our method, but also indicates that there might be "amino acid" patterns around TIS in cDNA and mRNA sequences.  相似文献   

3.
Shi JY  Zhang SW  Pan Q  Zhou GP 《Amino acids》2008,35(2):321-327
In the Post Genome Age, there is an urgent need to develop the reliable and effective computational methods to predict the subcellular localization for the explosion of newly found proteins. Here, a novel method of pseudo amino acid (PseAA) composition, the so-called “amino acid composition distribution” (AACD), is introduced. First, a protein sequence is divided equally into multiple segments. Then, amino acid composition of each segment is calculated in series. After that, each protein sequence can be represented by a feature vector. Finally, the feature vectors of all sequences thus obtained are further input into the multi-class support vector machines to predict the subcellular localization. The results show that AACD is quite effective in representing protein sequences for the purpose of predicting protein subcellular localization.  相似文献   

4.
MOTIVATION: With protein sequences entering into databanks at an explosive pace, the early determination of the family or subfamily class for a newly found enzyme molecule becomes important because this is directly related to the detailed information about which specific target it acts on, as well as to its catalytic process and biological function. Unfortunately, it is both time-consuming and costly to do so by experiments alone. In a previous study, the covariant-discriminant algorithm was introduced to identify the 16 subfamily classes of oxidoreductases. Although the results were quite encouraging, the entire prediction process was based on the amino acid composition alone without including any sequence-order information. Therefore, it is worthy of further investigation. RESULTS: To incorporate the sequence-order effects into the predictor, the 'amphiphilic pseudo amino acid composition' is introduced to represent the statistical sample of a protein. The novel representation contains 20 + 2lambda discrete numbers: the first 20 numbers are the components of the conventional amino acid composition; the next 2lambda numbers are a set of correlation factors that reflect different hydrophobicity and hydrophilicity distribution patterns along a protein chain. Based on such a concept and formulation scheme, a new predictor is developed. It is shown by the self-consistency test, jackknife test and independent dataset tests that the success rates obtained by the new predictor are all significantly higher than those by the previous predictors. The significant enhancement in success rates also implies that the distribution of hydrophobicity and hydrophilicity of the amino acid residues along a protein chain plays a very important role to its structure and function.  相似文献   

5.
6.
Two missense mutations have been identified in the phenylalanine hydroxylase (PAH) genes of an Italian phenylketonuria (PKU) patient. Both mutations occurred in exon 7 of the PAH gene, resulting in the substitution of Trp for Arg at amino acid 252 (R252W) and of Leu for Pro (P281L) at amino acid 281 of the protein. Expression vectors containing either the normal human PAH cDNA or mutant cDNAs were constructed and transfected into cultured mammalian cells. Extracts from cells transfected with either mutant construct showed negligible enzyme activity and undetectable levels of immunoreactive PAH protein as compared to the normal construct. These results are compatible with the severe classical PKU phenotype observed in this patient. Population genetic studies in the Italian population revealed that both the R252W and the P281L mutations are in linkage disequilibrium with mutant restriction fragment length polymorphism (RFLP) haplotype 1, which is the most prevalent RFLP haplotype in this population. The R252W mutation is present in 10% and the P281L mutation is present in 20% of haplotype 1 mutant chromosomes. These mutations are both very rare among other European populations, suggesting a Mediterranean origin for these mutant chromosomes.  相似文献   

7.
Functional consequences of PRODH missense mutations   总被引:5,自引:0,他引:5       下载免费PDF全文
PRODH maps to 22q11 in the region deleted in the velocardiofacial syndrome/DiGeorge syndrome (VCFS/DGS) and encodes proline oxidase (POX), a mitochondrial inner-membrane enzyme that catalyzes the first step in the proline degradation pathway. At least 16 PRODH missense mutations have been identified in studies of type I hyperprolinemia (HPI) and schizophrenia, 10 of which are present at polymorphic frequencies. The functional consequences of these missense mutations have been inferred by evolutionary conservation, but none have been tested directly. Here, we report the effects of these mutations on POX activity. We find that four alleles (R185Q, L289M, A455S, and A472T) result in mild (<30%), six (Q19P, A167V, R185W, D426N, V427M, and R431H) in moderate (30%-70%), and five (P406L, L441P, R453C, T466M, and Q521E) in severe (>70%) reduction in POX activity, whereas one (Q521R) increases POX activity. The POX encoded by one severe allele (T466M) shows in vitro responsiveness to high cofactor (flavin adenine dinucleotide) concentrations. Although there is limited information on plasma proline levels in individuals of known PRODH genotype, extant data suggest that severe hyperprolinemia (>800 microM) occurs in individuals with large deletions and/or PRODH missense mutations with the most-severe effect on function (L441P and R453C), whereas modest hyperprolinemia (300-500 microM) is associated with PRODH alleles with a moderate reduction in activity. Interestingly, three of the four alleles associated with or found in schizophrenia (V427M, L441P, and R453C) resulted in severe reduction of POX activity and hyperprolinemia. These observations plus the high degree of polymorphism at the PRODH locus are consistent with the hypothesis that reduction in POX function is a risk factor for schizophrenia.  相似文献   

8.
Amino acid mutation(s) that cause(s) partial or total unfolding of a protein can lead to disease states and failure to produce mutants. It is therefore very useful to be able to predict which mutations can retain the conformation of a wild-type protein and which mutations will lead to local or global unfolding of the protein. We have developed a fast and reasonably accurate method based on a backbone-dependent side-chain rotamer library to predict the (folded or unfolded) conformation of a protein upon mutation. This method has been tested on proteins whose wild-type 3D structures are known and whose mutant conformations have been experimentally characterized to be folded or unfolded. Furthermore, for the cases studied here, the predicted partially folded or denatured mutant conformation correlate with a decrease in the stability of the mutant relative to the wild-type protein. The key advantage of our method is that it is very fast and predicts locally or globally unfolded states fairly accurately. Hence, it may prove to be useful in designing site-directed mutagenesis, X-ray crystallography and drug design experiments as well as in free energy simulations by helping to ascertain whether a mutation will alter or retain the wild-type conformation.  相似文献   

9.
Harmful mutations are ubiquitous and inevitable, and the rate at which these mutations are removed from populations is a critical determinant of evolutionary fate. Closely related sexual and asexual taxa provide a particularly powerful setting to study deleterious mutation elimination because sexual reproduction should facilitate mutational clearance by reducing selective interference between sites and by allowing the production of offspring with different mutational complements than their parents. Here, we compared the rate of removal of conservative (i.e., similar biochemical properties) and radical (i.e., distinct biochemical properties) nonsynonymous mutations from mitochondrial genomes of sexual versus asexual Potamopyrgus antipodarum, a New Zealand freshwater snail characterized by coexisting and ecologically similar sexual and asexual lineages. Our analyses revealed that radical nonsynonymous mutations are cleared at higher rates than conservative changes and that sexual lineages eliminate radical changes more rapidly than asexual counterparts. These results are consistent with reduced efficacy of purifying selection in asexual lineages allowing harmful mutations to remain polymorphic longer than in sexual lineages. Together, these data illuminate some of the population‐level processes contributing to mitochondrial mutation accumulation and suggest that mutation accumulation could influence the outcome of competition between sexual and asexual lineages.  相似文献   

10.
We develop a new population-scale model incorporating diapause induction and termination that allows multi-year predictions of pest dynamics. In addition to predicting phenology and voltinism, the model also allows us to study the degree of overlapping among the life-stages across time; a quantity not generally predicted by previous models yet a key determinant of how frequently management must be done to maintain control. The model is a physiological, stage-structured population model that includes temperature-dependent vital rates, diapause processes, and plasticity in development. The model is statistically fitted with a 33-year long weekly term time series of Cydia pomonella adults captured in pheromone-baited traps from a research orchard in southern Pennsylvania. The multiannual model allows investigation of both within season control strategies, as well as the likely consequences of climate change for this important agricultural pest. The model predicts that warming temperatures will cause earlier spring emergence, additional generations, and increased overall abundance. Most importantly, by calculating the circular variance, we find that warmer temperatures are associated with an increase in overlap among life-stages especially at the beginning of the growing season. Our findings highlight the importance of modeling diapause to fully understand C. pomonella lifecycle and to better inform management for effectively controlling this pest in a warmer future.  相似文献   

11.
Previous results from this laboratory indicated that, in Escherichia coli K12, a new class of missense suppressors, which read the lysine codons AAA and AAG, may be misacylated lysine transfer RNAs. We therefore isolated and determined the nucleotide sequence of the lysine tRNA from two of the suppressor strains. In each case, we found both wild-type and mutant species of lysine tRNA, a result consistent with evidence that there are two genes for lysine tRNA in the E coli genome. The wild-type sequence was essentially identical to that reported for lysine tRNA from E. coli B. The mutant species isolated from each suppressor strain had a U for C70 nucleotide substitution, demonstrating that the AAG suppressor is a mutant lysine tRNA. The nucleotide substitution in the amino acid acceptor stem is consistent with the in vivo evidence that the suppressor corrects AAA and AAG missense mutations by inserting an amino acid other than lysine during polypeptide synthesis. This report represents the first verification of missense suppression caused by misacylation of a mutant tRNA.  相似文献   

12.
A systemic study of single amino acid substitutions in bacteriophage T4 lysozyme permitted a test of the concept that conserved amino acid residues are more functionally important than nonconserved residues. Substitutions of amino acid residues that are conserved among five bacteriophage-encoded lysozymes were found to lead more frequently to loss of function than substitutions of nonconserved residues. Of 163 residues tested, only 74 (45%) are sensitive to at least one substitution; however, all 14 residues that are fully conserved are sensitive to substitutions.  相似文献   

13.
Quantifying the distribution of fitness effects among newly arising mutations in the human genome is key to resolving important debates in medical and evolutionary genetics. Here, we present a method for inferring this distribution using Single Nucleotide Polymorphism (SNP) data from a population with non-stationary demographic history (such as that of modern humans). Application of our method to 47,576 coding SNPs found by direct resequencing of 11,404 protein coding-genes in 35 individuals (20 European Americans and 15 African Americans) allows us to assess the relative contribution of demographic and selective effects to patterning amino acid variation in the human genome. We find evidence of an ancient population expansion in the sample with African ancestry and a relatively recent bottleneck in the sample with European ancestry. After accounting for these demographic effects, we find strong evidence for great variability in the selective effects of new amino acid replacing mutations. In both populations, the patterns of variation are consistent with a leptokurtic distribution of selection coefficients (e.g., gamma or log-normal) peaked near neutrality. Specifically, we predict 27–29% of amino acid changing (nonsynonymous) mutations are neutral or nearly neutral (|s|<0.01%), 30–42% are moderately deleterious (0.01%<|s|<1%), and nearly all the remainder are highly deleterious or lethal (|s|>1%). Our results are consistent with 10–20% of amino acid differences between humans and chimpanzees having been fixed by positive selection with the remainder of differences being neutral or nearly neutral. Our analysis also predicts that many of the alleles identified via whole-genome association mapping may be selectively neutral or (formerly) positively selected, implying that deleterious genetic variation affecting disease phenotype may be missed by this widely used approach for mapping genes underlying complex traits.  相似文献   

14.

Background  

While occurring enzymatically in biological systems, O-linked glycosylation affects protein folding, localization and trafficking, protein solubility, antigenicity, biological activity, as well as cell-cell interactions on membrane proteins. Catalytic enzymes involve glycotransferases, sugar-transferring enzymes and glycosidases which trim specific monosaccharides from precursors to form intermediate structures. Due to the difficulty of experimental identification, several works have used computational methods to identify glycosylation sites.  相似文献   

15.
Xiao X  Shao S  Ding Y  Huang Z  Chou KC 《Amino acids》2006,30(1):49-54
Summary. The avalanche of newly found protein sequences in the post-genomic era has motivated and challenged us to develop an automated method that can rapidly and accurately predict the localization of an uncharacterized protein in cells because the knowledge thus obtained can greatly speed up the process in finding its biological functions. However, it is very difficult to establish such a desired predictor by acquiring the key statistical information buried in a pile of extremely complicated and highly variable sequences. In this paper, based on the concept of the pseudo amino acid composition (Chou, K. C. PROTEINS: Structure, Function, and Genetics, 2001, 43: 246–255), the approach of cellular automata image is introduced to cope with this problem. Many important features, which are originally hidden in the long amino acid sequences, can be clearly displayed through their cellular automata images. One of the remarkable merits by doing so is that many image recognition tools can be straightforwardly applied to the target aimed here. High success rates were observed through the self-consistency, jackknife, and independent dataset tests, respectively.  相似文献   

16.
Li FM  Li QZ 《Amino acids》2008,34(1):119-125
Summary. The subnuclear localization of nuclear protein is very important for in-depth understanding of the construction and function of the nucleus. Based on the amino acid and pseudo amino acid composition (PseAA) as originally introduced by K. C. Chou can incorporate much more information of a protein sequence than the classical amino acid composition so as to significantly enhance the power of using a discrete model to predict various attributes of a protein, an algorithm of increment of diversity combined with the improved quadratic discriminant analysis is proposed to predict the protein subnuclear location. The overall predictive success rates and correlation coefficient are 75.4% and 0.629 for 504 single localization proteins in jackknife test, and 80.4% for an independent set of 92 multi-localization proteins, respectively. For 406 single localization nuclear proteins with ≤25% sequence identity, the results of jackknife test show that the overall accuracy of prediction is 77.1%. Authors’ address: Qian-Zhong Li, Laboratory of Theoretical Biophysics, Department of Physics, College of Sciences and Technology, Inner Mongolia University, Hohhot 010021, China  相似文献   

17.
Almost 90% of nephrogenic diabetes insipidus (NDI) is due to mutations in the arginine-vasopressin receptor 2 gene (AVPR2). We retrospectively examined all the published mutations/variants in AVPR2. We planned to perform a comprehensive review of all the AVPR2 mutations/variants and to test whether any amino acid change causing a missense mutation is significantly more or less common than others. We performed a Medline search and collected detailed information regarding all AVPR2 mutations and variants. We performed a frequency comparison between mutated and wild-type amino acids and codons. We predicted the mutation effect or reported it based on published in vitro studies. We also reported the ethnicity of each mutation/variant carrier. In summary, we identified 211 AVPR2 mutations which cause NDI in 326 families and 21 variants which do not cause NDI in 71 NDI families. We described 15 different types of mutations including missense, frameshift, inframe deletion, deletion, insertion, nonsense, duplication, splicing and combined mutations. The missense mutations represent the 55.83% of all the NDI published families. Arginine and tyrosine are significantly (P = 4.07E-08 and P = 3.27E-04, respectively) the AVPR2 most commonly mutated amino acids. Alanine and glutamate are significantly (P = 0.009 and P = 0.019, respectively) the least mutated AVPR2 amino acids. The spectrum of mutations varies from rare gene variants or polymorphisms not causing NDI to rare mutations causing NDI, among which arginine and tyrosine are the most common missense. The AVPR2 mutations are spread world-wide. Our study may serve as an updated review, comprehensive of all AVPR2 variants and specific gene locations. J. Cell. Physiol. 217: 605-617, 2008. (c) 2008 Wiley-Liss, Inc.  相似文献   

18.
Wu G  Yan S 《Peptides》2003,24(12):1837-1845
In this study, we analyzed the amino acid pairs affected by mutations in two spike proteins from human coronavirus strains 229E and OC43 by means of random analysis in order to gain some insight into the possible mutations in the spike protein from SARS-CoV. The results demonstrate that the randomly unpredictable amino acid pairs are more sensitive to the mutations. The larger is the difference between actual and predicted frequencies, the higher is the chance of mutation occurring. The effect induced by mutations is to reduce the difference between actual and predicted frequencies. The amino acid pairs whose actual frequencies are larger than their predicted frequencies are more likely to be targeted by mutations, whereas the amino acid pairs whose actual frequencies are smaller than their predicted frequencies are more likely to be formed after mutations. These findings are identical to our several recent studies, i.e. the mutations represent a process of degeneration inducing human diseases.  相似文献   

19.
In disease screening and prognosis studies, an important task is to determine useful markers for identifying high-risk subgroups. Once such markers are established, they can be incorporated into public health practice to provide appropriate strategies for treatment or disease monitoring based on each individual's predicted risk. In the recent years, genetic and biological markers have been examined extensively for their potential to signal progression or risk of disease. In addition to these markers, it has often been argued that short-term outcomes may be helpful in making a better prediction of disease outcomes in clinical practice. In this paper we propose model-free non-parametric procedures to incorporate short-term event information to improve the prediction of a long-term terminal event. We include the optional availability of a single discrete marker measurement and assess the additional information gained by including the short-term outcome. We focus on the semi-competing risk setting where the short-term event is an intermediate event that may be censored by the terminal event while the terminal event is only subject to administrative censoring. Simulation studies suggest that the proposed procedures perform well in finite samples. Our procedures are illustrated using a data set of post-dialysis patients with end-stage renal disease.  相似文献   

20.
Lisewski AM 《PloS one》2008,3(9):e3110
The transmission of genomic information from coding sequence to protein structure during protein synthesis is subject to stochastic errors. To analyze transmission limits in the presence of spurious errors, Shannon's noisy channel theorem is applied to a communication channel between amino acid sequences and their structures established from a large-scale statistical analysis of protein atomic coordinates. While Shannon's theorem confirms that in close to native conformations information is transmitted with limited error probability, additional random errors in sequence (amino acid substitutions) and in structure (structural defects) trigger a decrease in communication capacity toward a Shannon limit at 0.010 bits per amino acid symbol at which communication breaks down. In several controls, simulated error rates above a critical threshold and models of unfolded structures always produce capacities below this limiting value. Thus an essential biological system can be realistically modeled as a digital communication channel that is (a) sensitive to random errors and (b) restricted by a Shannon error limit. This forms a novel basis for predictions consistent with observed rates of defective ribosomal products during protein synthesis, and with the estimated excess of mutual information in protein contact potentials.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号