首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 437 毫秒
1.

Background  

There has been an explosion in the number of single nucleotide polymorphisms (SNPs) within public databases. In this study we focused on non-synonymous protein coding single nucleotide polymorphisms (nsSNPs), some associated with disease and others which are thought to be neutral. We describe the distribution of both types of nsSNPs using structural and sequence based features and assess the relative value of these attributes as predictors of function using machine learning methods. We also address the common problem of balance within machine learning methods and show the effect of imbalance on nsSNP function prediction. We show that nsSNP function prediction can be significantly improved by 100% undersampling of the majority class. The learnt rules were then applied to make predictions of function on all nsSNPs within Ensembl.  相似文献   

2.
单核苷酸多态性(single nucleotide polymorphism,SNPs),即在基因组水平上由单个核苷酸的变异而引起的DNA序列多态性变化,具体是指在DNA序列中的单个碱基的变异,其是人类基因组变异种最常见的一种。SNP研究最主要的目的就是对人类表型变异遗传学的理解,尤其是关于人类遗传疾病的研究。而非同义单核苷酸多态性(nsSNPs)是SNPs中的一种,主要是指处于编码区会引起翻译后对应氨基酸序列变化的单核苷酸突变。因为nsSNPs可能会对蛋白质的功能造成影响,被认为是造成人类遗传病的主要原因。因此将与疾病相关的nsSNPs从中性的nsSNPs中区分出来是很重要的。本文根据国内外与疾病相关nsSNPs预测的研究,分析了预测中所涉及到的特征属性,总结了对这些特征进行优化的特征选择方法,并概述了在预测过程中使用的各种分类器。  相似文献   

3.
The prediction of the effects of nonsynonymous single nucleotide polymorphisms (nsSNPs) on function depends critically on exploiting all information available on the three-dimensional structures of proteins. We describe software and databases for the analysis of nsSNPs that allow a user to move from SNP to sequence to structure to function. In both structure prediction and the analysis of the effects of nsSNPs, we exploit information about protein evolution, in particular, that derived from investigations on the relation of sequence to structure gained from the study of amino acid substitutions in divergent evolution. The techniques developed in our laboratory have allowed fast and automated sequence-structure homology recognition to identify templates and to perform comparative modeling; as well as simple, robust, and generally applicable algorithms to assess the likely impact of amino acid substitutions on structure and interactions. We describe our strategy for approaching the relationship between SNPs and disease, and the results of benchmarking our approach -- human proteins of known structure and recognized mutation.  相似文献   

4.
Single-nucleotide polymorphisms (SNPs) are the most frequent form of genetic variations. Non-synonymous SNPs (nsSNPs) occurring in coding region result in single amino acid substitutions that associate with human hereditary diseases. Plenty of approaches were designed for distinguishing deleterious from neutral nsSNPs based on sequence level information. Novel in this work, combinations of protein–protein interaction (PPI) network topological features were introduced in predicting disease-related nsSNPs. Based on a dataset that was compiled from Swiss-Prot, a random forest model was constructed with an average accuracy value of 80.43 % and an MCC value of 0.60 in a rigorous tenfold crossvalidation test. For an independent dataset, our model achieved an accuracy of 88.05 % and an MCC of 0.67. Compared with previous studies, our approach presented superior prediction ability. Results showed that the incorporated PPI network topological features outperform conventional features. Our further analysis indicated that disease-related proteins are topologically different from other proteins. This study suggested that nsSNPs may share some topological information of proteins and the change of topological attributes could provide clues in illustrating functional shift due to nsSNPs.  相似文献   

5.
Chen R  Davydov EV  Sirota M  Butte AJ 《PloS one》2010,5(10):e13574
Many DNA variants have been identified on more than 300 diseases and traits using Genome-Wide Association Studies (GWASs). Some have been validated using deep sequencing, but many fewer have been validated functionally, primarily focused on non-synonymous coding SNPs (nsSNPs). It is an open question whether synonymous coding SNPs (sSNPs) and other non-coding SNPs can lead to as high odds ratios as nsSNPs. We conducted a broad survey across 21,429 disease-SNP associations curated from 2,113 publications studying human genetic association, and found that nsSNPs and sSNPs shared similar likelihood and effect size for disease association. The enrichment of disease-associated SNPs around the 80(th) base in the first introns might provide an effective way to prioritize intronic SNPs for functional studies. We further found that the likelihood of disease association was positively associated with the effect size across different types of SNPs, and SNPs in the 3' untranslated regions, such as the microRNA binding sites, might be under-investigated. Our results suggest that sSNPs are just as likely to be involved in disease mechanisms, so we recommend that sSNPs discovered from GWAS should also be examined with functional studies.  相似文献   

6.
Human non-synonymous SNPs: server and survey   总被引:37,自引:0,他引:37       下载免费PDF全文
  相似文献   

7.
Torkamani A  Schork NJ 《Genomics》2007,90(1):49-58
The human kinase gene family is composed of 518 genes that are involved in a diverse spectrum of physiological functions. They are also implicated in a number of diseases and encompass 10% of current drug targets. Contemporary, high-throughput sequencing efforts have identified a rich source of naturally occurring single nucleotide polymorphisms (SNPs) in kinases, a subset of which occur in the coding region of genes (cSNPs) and result in a change in the encoded amino acid sequence (nonsynonymous coding SNP; nscSNPs). What fraction of this naturally occurring variation underlies human disease is largely unknown (uDC), and much of it is assumed not to be disease causing (DC). We pursued a comprehensive computational analysis of the distribution of 1463 nscSNPs and 999 DC nscSNPs within the kinase gene family and have found that DCs are overrepresentated in the kinase catalytic domain and in receptor structures. In addition, the frequencies with which specific amino acid changes occur differ between the DCs and the uDCs, implying different biological characteristics for the two sets of human polymorphisms. Our results provide insights into the sequence and structural phenomena associated with naturally occurring kinase nscSNPs that contribute to human diseases.  相似文献   

8.

Background

Understanding and predicting molecular basis of disease is one of the major challenges in modern biology and medicine. SNPs associated with complex disorders can create, destroy, or modify protein coding sites. Single amino acid substitutions in the ATM gene are the most common forms of genetic variations that account for various forms of cancer. However, the extent to which SNPs interferes with the gene regulation and affects cancer susceptibility remains largely unknown.

Principal findings

We analyzed the deleterious nsSNPs associated with ATM gene based on different computational methods. An integrative scoring system and sequence conservation of amino acid residues was adapted for a priori nsSNP analysis of variants associated with cancer. We further extended our approach on SNPs that could potentially influence protein Post Translational Modifications in ATM gene.

Significance

In the lack of adequate prior reports on the possible deleterious effects of nsSNPs, we have systematically analyzed and characterized the functional variants in both coding and non coding region that can alter the expression and function of ATM gene. In silico characterization of nsSNPs affecting ATM gene function can aid in better understanding of genetic differences in disease susceptibility.  相似文献   

9.
MOTIVATION: Human single nucleotide polymorphisms (SNPs) are the most frequent type of genetic variation in human population. One of the most important goals of SNP projects is to understand which human genotype variations are related to Mendelian and complex diseases. Great interest is focused on non-synonymous coding SNPs (nsSNPs) that are responsible of protein single point mutation. nsSNPs can be neutral or disease associated. It is known that the mutation of only one residue in a protein sequence can be related to a number of pathological conditions of dramatic social impact such as Alzheimer's, Parkinson's and Creutzfeldt-Jakob's diseases. The quality and completeness of presently available SNPs databases allows the application of machine learning techniques to predict the insurgence of human diseases due to single point protein mutation starting from the protein sequence. RESULTS: In this paper, we develop a method based on support vector machines (SVMs) that starting from the protein sequence information can predict whether a new phenotype derived from a nsSNP can be related to a genetic disease in humans. Using a dataset of 21 185 single point mutations, 61% of which are disease-related, out of 3587 proteins, we show that our predictor can reach more than 74% accuracy in the specific task of predicting whether a single point mutation can be disease related or not. Our method, although based on less information, outperforms other web-available predictors implementing different approaches. AVAILABILITY: A beta version of the web tool is available at http://gpcr.biocomp.unibo.it/cgi/predictors/PhD-SNP/PhD-SNP.cgi  相似文献   

10.
Recent technological progress has permitted the efficient performance of genome-wide association studies (GWAS) to map genetic variants associated with common diseases. Here, we analyzed 2,893 single nucleotide polymorphisms (SNPs) that have been identified in 593 published GWAS as associated with a disease phenotype with respect to their genomic location. In absolute numbers, most significant SNPs are located in intergenic regions and introns. When compared to their representation on the chips, there is essentially overrepresentation of nonsynonymous coding SNPs (nsSNPs), synonymous coding SNPs, and SNPs in untranscribed regions upstream of genes among the disease associated SNPs. A Gene Ontology term analysis showed that genes putatively causing a phenotype often code for membrane associated proteins or signal transduction genes.  相似文献   

11.
Human genetic variations primarily result from single nucleotide polymorphisms (SNPs) that occurs approximately every 1000 bases in the overall human population. The non-synonymous SNPs (nsSNPs), lead to amino acid changes in the protein product may account for nearly half of the known genetic variations linked to inherited human diseases and cancer. One of the main problems of medical genetics today is to identify nsSNPs that underlie disease-related phenotypes in humans. An attempt was made to develop a new approach to predict such nsSNPs. This would enhance our understanding of genetic diseases and helps to predict the disease. We detect nsSNPs and all possible and reliable alleles by ANN, a soft computing model using potential SNP information. Reliable nsSNPs are identified, based on the reconstructed alleles and on sequence redundancy. The model gives good results with mean specificity (95.85&), sensitivity (97.40&) and accuracy (96.25&). Our results indicate that ANNs can serve as a useful method to analyze quantitative effect of nsSNPs on protein function and would be useful for large-scale analysis of genomic nsSNP data. AVAILABILITY: The database is available for free at http://www.snp.mirworks.in.  相似文献   

12.
MOTIVATION: The NCBI dbSNP database lists over 9 million single nucleotide polymorphisms (SNPs) in the human genome, but currently contains limited annotation information. SNPs that result in amino acid residue changes (nsSNPs) are of critical importance in variation between individuals, including disease and drug sensitivity. RESULTS: We have developed LS-SNP, a genomic scale software pipeline to annotate nsSNPs. LS-SNP comprehensively maps nsSNPs onto protein sequences, functional pathways and comparative protein structure models, and predicts positions where nsSNPs destabilize proteins, interfere with the formation of domain-domain interfaces, have an effect on protein-ligand binding or severely impact human health. It currently annotates 28,043 validated SNPs that produce amino acid residue substitutions in human proteins from the SwissProt/TrEMBL database. Annotations can be viewed via a web interface either in the context of a genomic region or by selecting sets of SNPs, genes, proteins or pathways. These results are useful for identifying candidate functional SNPs within a gene, haplotype or pathway and in probing molecular mechanisms responsible for functional impacts of nsSNPs. AVAILABILITY: http://www.salilab.org/LS-SNP CONTACT: rachelk@salilab.org SUPPLEMENTARY INFORMATION: http://salilab.org/LS-SNP/supp-info.pdf.  相似文献   

13.
Recent analyses of human genome sequences have given rise to impressive advances in identifying non-synonymous single nucleotide polymorphisms (nsSNPs). By contrast, the annotation of nsSNPs and their links to diseases are progressing at a much slower pace. Many of the current approaches to analysing disease-associated nsSNPs use primarily sequence and evolutionary information, while structural information is relatively less exploited. In order to explore the potential of such information, we developed a structure-based approach, Bongo (Bonds ON Graph), to predict structural effects of nsSNPs. Bongo considers protein structures as residue-residue interaction networks and applies graph theoretical measures to identify the residues that are critical for maintaining structural stability by assessing the consequences on the interaction network of single point mutations. Our results show that Bongo is able to identify mutations that cause both local and global structural effects, with a remarkably low false positive rate. Application of the Bongo method to the prediction of 506 disease-associated nsSNPs resulted in a performance (positive predictive value, PPV, 78.5%) similar to that of PolyPhen (PPV, 77.2%) and PANTHER (PPV, 72.2%). As the Bongo method is solely structure-based, our results indicate that the structural changes resulting from nsSNPs are closely associated to their pathological consequences.  相似文献   

14.
Single amino acid polymorphisms (SAPs), also known as non-synonymous single nucleotide polymorphisms (nsSNPs), are responsible for most of human genetic diseases. Discriminate the deleterious SAPs from neutral ones can help identify the disease genes and understand the mechanism of diseases. In this work, a method of deleterious SAP prediction at system level was established. Unlike most existing methods, our method not only considers the sequence and structure information, but also the network information. The integration of network information can improve the performance of deleterious SAP prediction. To make our method available to the public, we developed SySAP (a System-level predictor of deleterious Single Amino acid Polymorphisms), an easy-to-use and high accurate web server. SySAP is freely available at http://www.biosino.org/SySAP/and http://lifecenter.sgst.cn/SySAP/.  相似文献   

15.
钱旭丽  曹新 《遗传》2015,37(7):664-672
群体凝血因子C同源物基因(Coagulation factor C homology,COCH)是人类发现的第一个伴前庭功能障碍的耳聋基因,位于人类染色体14q12-q13上。迄今,在COCH基因上发现16个位点突变导致常染色体显性遗传非综合征型耳聋DFNA9的发生,其中包括13个非同义单核苷酸多态性(Non-synonymous single nucleotide polymorphisms,nsSNPs)位点。由于该基因其他nsSNPs的基因型与表型关系尚不清楚,因此文章采用生物信息学方法,从COCH基因全部的SNPs中分级筛选,结合已知的致病nsSNPs信息及蛋白三维结构验证,首次预测出由COCH基因编码的cochlin蛋白的vWFA (Von Willebrand factor type A domain)区的8个高风险致病性nsSNPs(I176T、R180Q、G265E、V269L、I368N、I372T、R416C和Y424D)。同时,对位于LCCL (Limulus factor C, cochlin, and late gestation lung protein Lgl1)区域的6个已知致病突变的nsSNPs ( P51S、G87W、I109N、I109T、W117R和F121S)进行了三维结构模拟,发现突变体均发生了环状结构或链状结构的改变。本研究对COCH基因的基因型与表型的相关性研究为遗传性耳聋筛查提供了相应的理论依据,也对该基因所编码的cochlin蛋白的功能研究具有一定的指导意义。  相似文献   

16.
Li Y  Wang Y  Li Y  Yang L 《FEBS letters》2006,580(30):6800-6806
The non-synonymous SNPs (nsSNPs) in coding regions, neutral or deleterious, could lead to the alteration of the function or structure of proteins. We have developed the computational models to analyze the deleterious nsSNPs in the transporters and predict ones in ABCB (ATP-binding cassette B) transporters of interest. The RPLS (ridge partial least square) and LDA (linear discriminant analysis) methods were applied to the problem, by training on a selection of datasets from a specified source, i.e., human transporters. The best combination of datasets and prediction attributes was ascertained. The prediction accuracy of the theoretical RPLS model for the training and testing sets is 84.8% and 80.4%, respectively (LDA: 84.3% and 80.4%), which indicates the models are reasonable and may be helpful for pharmacogenetics studies.  相似文献   

17.
MOTIVATION: There has been great expectation that the knowledge of an individual's genotype will provide a basis for assessing susceptibility to diseases and designing individualized therapy. Non-synonymous single nucleotide polymorphisms (nsSNPs) that lead to an amino acid change in the protein product are of particular interest because they account for nearly half of the known genetic variations related to human inherited diseases. To facilitate the identification of disease-associated nsSNPs from a large number of neutral nsSNPs, it is important to develop computational tools to predict the phenotypic effects of nsSNPs. RESULTS: We prepared a training set based on the variant phenotypic annotation of the Swiss-Prot database and focused our analysis on nsSNPs having homologous 3D structures. Structural environment parameters derived from the 3D homologous structure as well as evolutionary information derived from the multiple sequence alignment were used as predictors. Two machine learning methods, support vector machine and random forest, were trained and evaluated. We compared the performance of our method with that of the SIFT algorithm, which is one of the best predictive methods to date. An unbiased evaluation study shows that for nsSNPs with sufficient evolutionary information (with not <10 homologous sequences), the performance of our method is comparable with the SIFT algorithm, while for nsSNPs with insufficient evolutionary information (<10 homologous sequences), our method outperforms the SIFT algorithm significantly. These findings indicate that incorporating structural information is critical to achieving good prediction accuracy when sufficient evolutionary information is not available. AVAILABILITY: The codes and curated dataset are available at http://compbio.utmem.edu/snp/dataset/  相似文献   

18.
Recent progress in identification and mapping of single nucleotide polymorphisms (SNPs) in the human genome generates an unprecedented opportunity to explore cause-effect relationships between genetic variations and susceptibility to common diseases. For this purpose, one promising strategy would be to select a set of SNPs that potentially alter the function of proteins involved in the pathogenesis of the diseases and compare their frequencies in the affected individuals and the healthy population. In this respect, SNPs that change amino acid sequences (nonsynonymous SNPs; nsSNPs) are of particular interest, since they are more likely to affect protein functions. In this study, we have constructed a catalog of nsSNPs (PicSNP), whose unique features are (i) nsSNPs are classified according to the functions of the affected genes and are searchable under the guidance of hierarchical lists of protein functions and (ii) nsSNPs that lead to amino acid changes in the known functional sites and domains of proteins are highlighted. Out of 1,190,295 SNPs extracted from public database, we identified 3793 nsSNPs and classified them in 1247 categories of protein functions. 495 sites and domains annotated in the Swiss-Prot database were found to include nsSNPs, including 2 nsSNPs in disulfide-binding sites and 38 nsSNPs in transmembrane regions. PicSNP is available via the World Wide Web (http://picsnp.org) and would support research questing for SNPs involved in common diseases.  相似文献   

19.

Background  

Human genetic variations primarily result from single nucleotide polymorphisms (SNPs) that occur approximately every 1000 bases in the overall human population. The non-synonymous SNPs (nsSNPs) that lead to amino acid changes in the protein product may account for nearly half of the known genetic variations linked to inherited human diseases. One of the key problems of medical genetics today is to identify nsSNPs that underlie disease-related phenotypes in humans. As such, the development of computational tools that can identify such nsSNPs would enhance our understanding of genetic diseases and help predict the disease.  相似文献   

20.
Certain genetic variations in the human population are associated with heritable diseases, and single nucleotide polymorphisms (SNPs) represent the most common form of such differences in DNA sequence. In particular, substantial interest exists in determining whether a non-synonymous SNP (nsSNP), leading to a single residue replacement in the translated protein product, is neutral or disease-related. The nature of protein structure-function relationships suggests that nsSNP effects, either benign or leading to aberrant protein function possibly associated with disease, are dependent on relative structural changes introduced upon mutation. In this study, we characterize a representative sampling of 1790 documented neutral and disease-related human nsSNPs mapped to 243 diverse human protein structures, by quantifying environmental perturbations in the associated proteins with the use of a computational mutagenesis methodology that relies on a four-body, knowledge-based, statistical contact potential. These structural change data are used as attributes to generate a vector representation for each nsSNP, in combination with additional features reflecting sequence and structure of the corresponding protein. A trained model based on the random forest supervised classification algorithm achieves 76% cross-validation accuracy. Our classifier performs at least as well as other methods that use significantly larger datasets of nsSNPs for model training, and the novelty of our attributes differentiates the model as an orthogonal approach that can be utilized in conjunction with other techniques. A dedicated server for obtaining predictions, as well as supporting datasets and documentation, is available at http://proteins.gmu.edu/automute.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号