首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Single-point mutations are one of the most frequent causes of genetic variability in both human and close species. The recent availability of different bioinformatics tools for annotating human single nucleotide polymorphisms (SNPs) has opened the possibility of using them to score SNPs from species with a biomedical interest, in particular from mice and other models of human disease. Also, this ability to predict pathogenicity of single point mutations in one species, based on data from another species, opens the possibility to predict the pathological character of single point mutations in humans using data from well-characterized model systems of human disease. This could provide a valuable alternative to the more traditional genetic population approaches. However, transferral of prediction tools may be limited by different factors, from a species bias in the training set, to a large sequence divergence between the proteomes of the training and the target species. Here we study the conditions under which prediction tools can be transferred among species, concentrating in the case of mice. We find that for the majority of the human-mouse homolog pairs, the sequence similarity is large enough to preserve the pathological character of mutations among species, in general. We then establish that prediction/annotation tools developed for one organism can be used to predict the neutral/pathological character of mutations/SNPs in the other organism.  相似文献   

2.
While genome-era technologies focused on complete genome sequencing in various organisms, post-genome technologies aim at the understanding of the mechanisms of genetic information processing and elucidation of within-species variation. Single nucleotide polymorphisms (SNPs) are the most common source of genome variation in the human population. Nonsynonymous SNPs that occur in coding gene regions and result in amino acid substitutions are of particular interest. It is thought that such SNPs are responsible for phenotypic variation, quantitative traits, and the etiology of common diseases. PolyPhen is a computational tool for the prediction of putatively functional nonsynonymous SNPs by combining information of various types. The application areas of PolyPhen and similar methods include the genetics of complex diseases and congenital defects, the identification of functional mutations in model organisms, and evolutionary genetics.  相似文献   

3.
A fundamental goal of medical genetics is the accurate prediction of genotype–phenotype correlations. As an approach to develop more accurate in silico tools for prediction of disease-causing mutations of structural proteins, we present a gene- and disease-specific prediction tool based on a large systematic analysis of missense mutations from hemophilia A (HA) patients. Our HA-specific prediction tool, HApredictor, showed disease prediction accuracy comparable to other publicly available prediction software. In contrast to those methods, its performance is not limited to non-synonymous mutations. Given the role of synonymous mutations in disease and drug codon optimization, we propose that utilizing a gene- and disease-specific method can be highly useful to make functional predictions possible even for synonymous mutations. Incorporating computational metrics at both nucleotide and amino acid levels along with multiple protein sequence/structure alignment significantly improved the predictive performance of our tool. HApredictor is freely available for download at http://www.ncbi.nlm.nih.gov/CBBresearch/Przytycka/HA_Predict/index.htm.  相似文献   

4.
Protein sequence-based predictors of nucleic acid (NA)-binding include methods that predict NA-binding proteins and NA-binding residues. The residue-level tools produce more details but suffer high computational cost since they must predict every amino acid in the input sequence and rely on multiple sequence alignments. We propose an alternative approach that predicts content (fraction) of the NA-binding residues, offering more information than the protein-level prediction and much shorter runtime than the residue-level tools. Our first-of-its-kind content predictor, qNABpredict, relies on a small, rationally designed and fast-to-compute feature set that represents relevant characteristics extracted from the input sequence and a well-parametrized support vector regression model. We provide two versions of qNABpredict, a taxonomy-agnostic model that can be used for proteins of unknown taxonomic origin and more accurate taxonomy-aware models that are tailored to specific taxonomic kingdoms: archaea, bacteria, eukaryota, and viruses. Empirical tests on a low-similarity test dataset show that qNABpredict is 100 times faster and generates statistically more accurate content predictions when compared to the content extracted from results produced by the residue-level predictors. We also show that qNABpredict's content predictions can be used to improve results generated by the residue-level predictors. We release qNABpredict as a convenient webserver and source code at http://biomine.cs.vcu.edu/servers/qNABpredict/ . This new tool should be particularly useful to predict details of protein–NA interactions for large protein families and proteomes.  相似文献   

5.
Cheng J  Randall A  Baldi P 《Proteins》2006,62(4):1125-1132
Accurate prediction of protein stability changes resulting from single amino acid mutations is important for understanding protein structures and designing new proteins. We use support vector machines to predict protein stability changes for single amino acid mutations leveraging both sequence and structural information. We evaluate our approach using cross-validation methods on a large dataset of single amino acid mutations. When only the sign of the stability changes is considered, the predictive method achieves 84% accuracy-a significant improvement over previously published results. Moreover, the experimental results show that the prediction accuracy obtained using sequence alone is close to the accuracy obtained using tertiary structure information. Because our method can accurately predict protein stability changes using primary sequence information only, it is applicable to many situations where the tertiary structure is unknown, overcoming a major limitation of previous methods which require tertiary information. The web server for predictions of protein stability changes upon mutations (MUpro), software, and datasets are available at http://www.igb.uci.edu/servers/servers.html.  相似文献   

6.
Single Nucleotide Polymorphisms (SNPs) are being intensively studied to understand the biological basis of complex traits and diseases. The Genetics of human phenotype variation could be understood by knowing the functions of SNPs. In this study using computational methods, we analyzed the genetic variations that can alter the expression and function of the CFTR gene responsible candidate for causing cystic fibrosis. We applied an evolutionary perspective to screen the SNPs using a sequence homology-based SIFT tool, which suggested that 17 nsSNPs (44%) were found to be deleterious. The structure-based approach PolyPhen server suggested that 26 nsSNPS (66%) may disrupt protein function and structure. The PupaSuite tool predicted the phenotypic effect of SNPs on the structure and function of the affected protein. Structure analysis was carried out with the major mutation that occurred in the native protein coded by CFTR gene, and which is at amino acid position F508C for nsSNP with id (rs1800093). The amino acid residues in the native and mutant modeled protein were further analyzed for solvent accessibility, secondary structure and stabilizing residues to check the stability of the proteins. The SNPs were further subjected to iHAP analysis to identify htSNPs, and we report potential candidates for future studies on CFTR mutations.  相似文献   

7.
We studied 10 protein-coding mitochondrial genes from 19 mammalian species to evaluate the effects of 10 amino acid properties on the evolution of the genetic code, the amino acid composition of proteins, and the pattern of nonsynonymous substitutions. The 10 amino acid properties studied are the chemical composition of the side chain, two polarity measures, hydropathy, isoelectric point, volume, aromaticity, aliphaticity, hydrogenation, and hydroxythiolation. The genetic code appears to have evolved toward minimizing polarity and hydropathy but not the other seven properties. This can be explained by our finding that the presumably primitive amino acids differed much only in polarity and hydropathy, but little in the other properties. Only the chemical composition (C) and isoelectric point (IE) appear to have affected the amino acid composition of the proteins studied, that is, these proteins tend to have more amino acids with typical C and IE values, so that nonsynonymous mutations tend to result in small differences in C and IE. All properties, except for hydroxythiolation, affect the rate of nonsynonymous substitution, with the observed amino acid changes having only small differences in these properties, relative to the spectrum of all possible nonsynonymous mutations. Received: 2 January 1998 / Accepted: 25 April 1998  相似文献   

8.
Single-nucleotide polymorphisms (SNPs) play a major role in the understanding of the genetic basis of many complex human diseases. Also, the genetics of human phenotype variation could be understood by knowing the functions of these SNPs. It is still a major challenge to identify the functional SNPs in a disease-related gene. In this work, we have analyzed the genetic variation that can alter the expression and the function of the BRCA1 gene using computational methods. Of the total 477 SNPs, 65 were found to be nonsynonymous (ns) SNPs. Among the 14 SNPs in the untranslated region, 4 were found in the 5' and 10 were found in the 3' untranslated region (UTR). It was found that 16.9% of the nsSNPs were damaging, by both the SIFT and the PolyPhen servers. The UTR Resource tool suggested that 2 of 4 SNPs in the 5' UTR and 3 of 10 SNPs in the 3' UTR might change the protein expression levels. We identified major mutations from proline to serine at positions 1776 and 1812 of the native protein of the BRCA1 gene. From a comparison of the stabilizing residues of the native and mutant proteins, we propose that an nsSNP (rs1800751) could be an important candidate for the breast cancer caused by the BRCA1 gene.  相似文献   

9.
The identification and annotation of protein domains provides a critical step in the accurate determination of molecular function. Both computational and experimental methods of protein structure determination may be deterred by large multi-domain proteins or flexible linker regions. Knowledge of domains and their boundaries may reduce the experimental cost of protein structure determination by allowing researchers to work on a set of smaller and possibly more successful alternatives. Current domain prediction methods often rely on sequence similarity to conserved domains and as such are poorly suited to detect domain structure in poorly conserved or orphan proteins. We present here a simple computational method to identify protein domain linkers and their boundaries from sequence information alone. Our domain predictor, Armadillo (http://armadillo.blueprint.org), uses any amino acid index to convert a protein sequence to a smoothed numeric profile from which domains and domain boundaries may be predicted. We derived an amino acid index called the domain linker propensity index (DLI) from the amino acid composition of domain linkers using a non-redundant structure dataset. The index indicates that Pro and Gly show a propensity for linker residues while small hydrophobic residues do not. Armadillo predicts domain linker boundaries from Z-score distributions and obtains 35% sensitivity with DLI in a two-domain, single-linker dataset (within +/-20 residues from linker). The combination of DLI and an entropy-based amino acid index increases the overall Armadillo sensitivity to 56% for two domain proteins. Moreover, Armadillo achieves 37% sensitivity for multi-domain proteins, surpassing most other prediction methods. Armadillo provides a simple, but effective method by which prediction of domain boundaries can be obtained with reasonable sensitivity. Armadillo should prove to be a valuable tool for rapidly delineating protein domains in poorly conserved proteins or those with no sequence neighbors. As a first-line predictor, domain meta-predictors could yield improved results with Armadillo predictions.  相似文献   

10.
We use a recently developed coarse-grained computational model to investigate the relative stability of two different sets of de novo designed four-helix bundle proteins. Our simulations suggest a possible explanation for the experimentally observed increase in stability of the four-helix bundles with increasing sequence length. In details, we show that both short subsequences composed only by polar residues and additional nonpolar residues inserted, via different point mutations in ad hoc positions, seem to play a significant role in stabilizing the four-helix bundle conformation in the longer sequences. Finally, we propose an additional mutation that rescues a short amino acid sequence that would otherwise adopt a compact misfolded state. Our work suggests that simple computational models can be used as a complementary tool in the design process of de novo proteins.  相似文献   

11.
Shen J  Deininger PL  Zhao H 《Cytokine》2006,35(1-2):62-66
Understanding the functions of single nucleotide polymorphisms (SNPs) can greatly help to understand the genetics of the human phenotype variation and especially the genetic basis of human complex diseases. However, how to identify functional SNPs from a pool containing both functional and neutral SNPs is challenging. In this study, we analyzed the genetic variations that can alter the expression and function of a group of cytokine proteins using computational tools. As a result, we extracted 4552 SNPs from 45 cytokine proteins from SNPper database. Of particular interest, 828 SNPs were in the 5'UTR region, 961 SNPs were in the 3' UTR region, and 85 SNPs were non-synonymous SNPs (nsSNPs), which cause amino acid change. Evolutionary conservation analysis using the SIFT tool suggested that 8 nsSNPs may disrupt the protein function. Protein structure analysis using the PolyPhen tool suggested that 5 nsSNPs might alter protein structure. Binding motif analysis using the UTResource tool suggested that 27 SNPs in 5' or 3'UTR might change protein expression levels. Our study demonstrates the presence of naturally occurring genetic variations in the cytokine proteins that may affect their expressions and functions with possible roles in complex human disease, such as immune diseases.  相似文献   

12.
Several studies have shown that immune system proteins have on average a higher rate of amino acid evolution between different species of mammals than do most other proteins. To test whether immune-system-expressed loci show a correspondingly elevated rate of within-species nonsynonymous (amino acid altering) polymorphism, we examined gene diversity (heterozygosity) at 4,911 single nucleotide polymorphism (SNP) sites at 481 protein-coding loci. At loci with nonimmune functions, gene diversity at nonsynonymous SNP sites was typically lower than that at silent SNP sites (those not altering the amino acid sequence) in the same gene, a pattern that is an evidence of purifying selection acting to eliminate slightly deleterious variants. However, this pattern was not seen at nonsynonymous SNPs causing conservative amino acid replacements in immune system proteins, indicating that the latter are subject to a reduced level of functional constraint. Similarly, immune system genes showed higher gene diversities in their 5′ noncoding regions than did other proteins. These results identified certain immune system loci that are likely to be subject to balancing selection that acts to maintain polymorphism in either coding or regulatory regions. Electronic Supplementary Material Supplementary material is available for this article at .  相似文献   

13.
Previous studies on human mitochondrial genomes showed that the ratio of intra-specific diversities at nonsynonymous-to-synonymous positions was two to ten times higher than the ratio of interspecific divergences at these positions, suggesting an excess of slightly deleterious nonsynonymous polymorphisms. However, such an overabundance of nonsynonymous single nucleotide polymorphisms (SNPs) was not found in human nuclear genomes. Here, genome-wide estimates using >14,000 human-chimp nuclear genes and 1 million SNPs from four human genomes showed a significant proportion of deleterious nonsynonymous SNPs (~ 15%). Importantly, this study reveals a negative correlation between the magnitude of selection pressure and the proportion of deleterious SNPs on human genes. The proportion of deleterious amino acid replacement polymorphisms is 3.5 times higher in genes under high purifying selection compared with that in less constrained genes (28% vs. 8%). These results are explained by differences in the extent of contribution of mildly deleterious mutations to diversity and substitution.  相似文献   

14.
One of the major bottlenecks in many ab initio protein structure prediction methods is currently the selection of a small number of candidate structures for high‐resolution refinement from large sets of low‐resolution decoys. This step often includes a scoring by low‐resolution energy functions and a clustering of conformations by their pairwise root mean square deviations (RMSDs). As an efficient selection is crucial to reduce the overall computational cost of the predictions, any improvement in this direction can increase the overall performance of the predictions and the range of protein structures that can be predicted. We show here that the use of structural profiles, which can be predicted with good accuracy from the amino acid sequences of proteins, provides an efficient means to identify good candidate structures. Proteins 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

15.
Deleterious mutations associated with human diseases are predominantly found in conserved positions and positions that are essential for the structure and/or function of proteins. However, these mutations are purged from the human population over time and prevented from being fixed. Contrary to this belief, here I show that high proportions of deleterious amino acid changing mutations are fixed at positions critical for the structure and/or function of proteins. Similarly, a high rate of fixation of deleterious mutations was observed in slow-evolving amino acid positions of human proteins. The fraction of deleterious substitutions was found to be two times higher in relatively conserved amino acid positions than in highly variable positions. This study also found fixation of a much higher proportion of radical amino acid changes in primates compared with rodents and artiodactyls in slow-evolving positions. Previous studies observed a higher proportion of nonsynonymous substitutions in humans compared with other mammals, which was taken as indirect evidence for the fixation of deleterious mutations in humans. However, the results of this investigation provide direct evidence for this prediction by suggesting that the excess nonsynonymous mutations fixed in humans are indeed deleterious in nature. Furthermore, these results suggest that studies on disease-associated mutations should consider that a significant fraction of such deleterious mutations has already been fixed in the human genome, and thus, the effects of new mutations at those amino acid positions may not necessarily be deleterious and might even result in reversion to benign phenotypes.  相似文献   

16.
《Journal of molecular biology》2019,431(19):3933-3942
The molecular mechanisms of pathological non-synonymous single-nucleotide polymorphisms are still the object of intensive research. To this end, we explore here whether non-synonymous single-nucleotide polymorphisms can work via allosteric mechanisms. Using structure-based statistical mechanical model of allostery and analyzing energetics of the effects of mutations in a set of 27 proteins with at least 50 pathological SNPs in each molecule, we found that, indeed, some SNPs can work allosterically. We illustrate the molecular basis of disease phenotypes caused by allosteric SNPs with the case studies of human galactose 1-phosphate uridyltransferase (GALT) and glucose-6-phosphate dehydrogenase (G6PD). We also found that mutations of a number of other residues in the protein may cause modulation comparable to those observed for known pathological SNPs. In order to explain this, we propose a notion of allosteric polymorphism, which implies the presence of a number of critical positions in the protein sequence, whose mutations can allosterically disrupt the protein function and result in a disease phenotype. We conclude that the emerging importance of allosteric polymorphism calls for the development of computational framework for analyzing the allosteric effects of mutations and their role in the modulation of protein activity.  相似文献   

17.
Structural location of disease-associated single-nucleotide polymorphisms   总被引:7,自引:0,他引:7  
Non-synonymous single-nucleotide polymorphism (nsSNP) of genes introduces amino acid changes to proteins, and plays an important role in providing genetic functional diversity. To understand the structural characteristics of disease-associated SNPs, we have mapped a set of nsSNPs derived from the online mendelian inheritance in man (OMIM) database to the structural surfaces of encoded proteins. These nsSNPs are disease-associated or have distinctive phenotypes. As a control dataset, we mapped a set of nsSNPs derived from SNP database dbSNP to the structural surfaces of those encoded proteins. Using the alpha shape method from computational geometry, we examine the geometric locations of the structural sites of these nsSNPs. We classify each nsSNP site into one of three categories of geometric locations: those in a pocket or a void (type P); those on a convex region or a shallow depressed region (type S); and those that are buried completely in the interior (type I). We find that the majority (88%) of disease-associated nsSNPs are located in voids or pockets, and they are infrequently observed in the interior of proteins (3.2% in the data set). We find that nsSNPs mapped from dbSNP are less likely to be located in pockets or voids (68%). We further introduce a novel application of hidden Markov models (HMM) for analyzing sequence homology of SNPs on various geometric sites. For SNPs on surface pocket or void, we find that there is no strong tendency for them to occur on conserved residues. For SNPs buried in the interior, we find that disease-associated mutations are more likely to be conserved. The approach of classifying nsSNPs with alpha shape and HMM developed in this study can be integrated with additional methods to improve the accuracy of predictions of whether a given nsSNP is likely to be disease-associated.  相似文献   

18.
The prediction of the effects of nonsynonymous single nucleotide polymorphisms (nsSNPs) on function depends critically on exploiting all information available on the three-dimensional structures of proteins. We describe software and databases for the analysis of nsSNPs that allow a user to move from SNP to sequence to structure to function. In both structure prediction and the analysis of the effects of nsSNPs, we exploit information about protein evolution, in particular, that derived from investigations on the relation of sequence to structure gained from the study of amino acid substitutions in divergent evolution. The techniques developed in our laboratory have allowed fast and automated sequence-structure homology recognition to identify templates and to perform comparative modeling; as well as simple, robust, and generally applicable algorithms to assess the likely impact of amino acid substitutions on structure and interactions. We describe our strategy for approaching the relationship between SNPs and disease, and the results of benchmarking our approach -- human proteins of known structure and recognized mutation.  相似文献   

19.
Single nucleotide polymorphisms (SNPs) are believed to contain relevant information and have been therefore extensively used as genetic markers in population and conservation genetics, and molecular ecology studies. This study reports on the identification of potential SNPs in a diploid European sea bass Dicentrarchus labrax genome by using reference sequences from three assembled chromosomes and mapping all WGS datasets onto them (3× Sanger, 3× 454 and 20× SOLEXA). A total of 20,779 SNPs were identified over the 1469 gene loci and intergenic space analysed. Within chromosomes the occurrence of SNPs was the lowest in exons and higher in introns and intergenic regions, which may be explained by the fact, that coding regions are under strong selective pressure to maintain their biological function. The ratio of nonsynonymous to synonymous mutations was smaller than one for all the chromosomes, suggesting that most of deleterious nonsynonymous mutations were eliminated by negative selection. SNPs were not uniformly distributed over the chromosomes. Two chromosomes exhibited large regions with extremely low SNP density, which might represent homozygous regions in the diploid genome. The results of this study show how SNP detection can take profit from sequencing a single diploid individual, but also uncover the limits of such an approach. SNPs that have been identified will support marker development for genetic linkage mapping, population genetics and aquaculture related questions in general.  相似文献   

20.
In asexual lineages, both synonymous and nonsynonymous sequence polymorphism may be reduced due to severe founder effects when asexual lineages originate. However, mildly deleterious (nonsynonymous) mutations may accumulate after asexual lineages are formed, because the efficiency of purifying selection is reduced even in the nonrecombining mitochondrial genome. Here we examine patterns of synonymous and nonsynonymous mitochondrial sequence polymorphism in asexual and sexual lineages of the freshwater snail Campeloma. Using clade-specific estimates, we found that synonymous sequence polymorphism was significantly reduced by 75% in asexuals relative to sexuals, whereas nonsynonymous sequence polymorphism did not differ significantly between sexuals and asexuals. Two asexual clades had high negative values for Tajima's D statistic. Coalescent simulations confirmed that various bottleneck scenarios can account for this result. We also used branch-specific estimates of the ratio of amino acid to silent substitutions, K(a)/K(s). Our study revealed that K(a)/K(s) ratios are six times higher in terminal branches of independent asexual lineages compared to sexuals. Coalescent-based reconstruction of gene networks for all sexual and asexual clades indicated that nonsynonymous mutations occurred at a higher frequency in recently derived asexual haplotypes. These findings suggest that patterns of synonymous and nonsynonymous nucleotide polymorphism in asexual snail lineages may be shaped by both severe founder effect and relaxed purifying selection.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号