首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Many non-synonymous SNPs (nsSNPs) are associated with diseases, and numerous machine learning methods have been applied to train classifiers for sorting disease-associated nsSNPs from neutral ones. The continuously accumulated nsSNP data allows us to further explore better prediction approaches. In this work, we partitioned the training data into 20 subsets according to either original or substituted amino acid type at the nsSNP site. Using support vector machine (SVM), training classification models on each subset resulted in an overall accuracy of 76.3% or 74.9% depending on the two different partition criteria, while training on the whole dataset obtained an accuracy of only 72.6%. Moreover, the dataset was also randomly divided into 20 subsets, but the corresponding accuracy was only 73.2%. Our results demonstrated that partitioning the whole training dataset into subsets properly, i.e., according to the residue type at the nsSNP site, will improve the performance of the trained classifiers significantly, which should be valuable in developing better tools for predicting the disease-association of nsSNPs.  相似文献   

2.
In this study, we identified the most deleterious nsSNP in CDKN2A gene through structural and functional properties of its protein (p16INK4A) and investigated its binding affinity with cdk6. Out of 118 SNPs, 14 are nsSNPs in the coding region and 17 SNPs were found in the untranslated region (UTR). FastSNP suggested that 7 SNPs in the 5' UTR might change the protein expression levels. Sixty-four percent of nsSNPs are found to be damaged in PolyPhen server among the 14 nsSNPs investigated. With this effort, we modeled the mutant p16INK4A proteins based on these deleterious nsSNPs, out of which three nsSNPs associated p16INK4A had RMSD values of greater than 3.00 A with native protein. From a comparison of total energy of these three mutant proteins, we identified that the major mutation is from Aspartic acid to Tyrosine at the residue position of 84 of p16INK4A. Further, we compared the binding efficiency of both native and mutant p16INK4A with cdk6. We found that mutant p16INK4A has less binding affinity with cdk6 compared to native type. This is due to ten hydrogen bonds and eight salt bridges which exist between the native type and cdk6, whereas the mutant type makes only nine hydrogen bonds and five salt bridges with cdk6. Based on our investigation, we propose that the SNP with the ID rs11552822 could be the most deleterious nsSNP in CDKN2A gene, causing malignant melanoma, as it was well correlated with experimental studies carried out elsewhere.  相似文献   

3.
Certain genetic variations in the human population are associated with heritable diseases, and single nucleotide polymorphisms (SNPs) represent the most common form of such differences in DNA sequence. In particular, substantial interest exists in determining whether a non-synonymous SNP (nsSNP), leading to a single residue replacement in the translated protein product, is neutral or disease-related. The nature of protein structure-function relationships suggests that nsSNP effects, either benign or leading to aberrant protein function possibly associated with disease, are dependent on relative structural changes introduced upon mutation. In this study, we characterize a representative sampling of 1790 documented neutral and disease-related human nsSNPs mapped to 243 diverse human protein structures, by quantifying environmental perturbations in the associated proteins with the use of a computational mutagenesis methodology that relies on a four-body, knowledge-based, statistical contact potential. These structural change data are used as attributes to generate a vector representation for each nsSNP, in combination with additional features reflecting sequence and structure of the corresponding protein. A trained model based on the random forest supervised classification algorithm achieves 76% cross-validation accuracy. Our classifier performs at least as well as other methods that use significantly larger datasets of nsSNPs for model training, and the novelty of our attributes differentiates the model as an orthogonal approach that can be utilized in conjunction with other techniques. A dedicated server for obtaining predictions, as well as supporting datasets and documentation, is available at http://proteins.gmu.edu/automute.  相似文献   

4.
The computational approaches in determining disease-associated Non-synonymous single nucleotide polymorphisms (nsSNPs) have evolved very rapidly. Large number of deleterious and disease-associated nsSNP detection tools have been developed in last decade showing high prediction reliability. Despite of all these highly efficient tools, we still lack the accuracy level in determining the genotype–phenotype association of predicted nsSNPs. Furthermore, there are enormous questions that are yet to be computationally compiled before we might talk about the prediction accuracy. Earlier we have incorporated molecular dynamics simulation approaches to foster the accuracy level of computational nsSNP analysis roadmap, which further helped us to determine the changes in the protein phenotype associated with the computationally predicted disease-associated mutation. Here we have discussed on the present scenario of computational nsSNP characterization technique and some of the questions that are crucial for the proper understanding of pathogenicity level for any disease associated mutations.  相似文献   

5.
Human non-synonymous SNPs: server and survey   总被引:37,自引:0,他引:37       下载免费PDF全文
  相似文献   

6.

Background  

There has been an explosion in the number of single nucleotide polymorphisms (SNPs) within public databases. In this study we focused on non-synonymous protein coding single nucleotide polymorphisms (nsSNPs), some associated with disease and others which are thought to be neutral. We describe the distribution of both types of nsSNPs using structural and sequence based features and assess the relative value of these attributes as predictors of function using machine learning methods. We also address the common problem of balance within machine learning methods and show the effect of imbalance on nsSNP function prediction. We show that nsSNP function prediction can be significantly improved by 100% undersampling of the majority class. The learnt rules were then applied to make predictions of function on all nsSNPs within Ensembl.  相似文献   

7.
MOTIVATION: The NCBI dbSNP database lists over 9 million single nucleotide polymorphisms (SNPs) in the human genome, but currently contains limited annotation information. SNPs that result in amino acid residue changes (nsSNPs) are of critical importance in variation between individuals, including disease and drug sensitivity. RESULTS: We have developed LS-SNP, a genomic scale software pipeline to annotate nsSNPs. LS-SNP comprehensively maps nsSNPs onto protein sequences, functional pathways and comparative protein structure models, and predicts positions where nsSNPs destabilize proteins, interfere with the formation of domain-domain interfaces, have an effect on protein-ligand binding or severely impact human health. It currently annotates 28,043 validated SNPs that produce amino acid residue substitutions in human proteins from the SwissProt/TrEMBL database. Annotations can be viewed via a web interface either in the context of a genomic region or by selecting sets of SNPs, genes, proteins or pathways. These results are useful for identifying candidate functional SNPs within a gene, haplotype or pathway and in probing molecular mechanisms responsible for functional impacts of nsSNPs. AVAILABILITY: http://www.salilab.org/LS-SNP CONTACT: rachelk@salilab.org SUPPLEMENTARY INFORMATION: http://salilab.org/LS-SNP/supp-info.pdf.  相似文献   

8.
Age-related cataract is clinically and genetically heterogeneous disorder affecting the ocular lens, and the leading cause of vision loss and blindness worldwide. Here we screened nonsynonymous single nucleotide polymorphisms (nsSNPs) of a novel gene, EPHA2 responsible for age related cataracts. The SNPs were retrieved from dbSNP. Using I-Mutant, protein stability change was calculated. The potentially functional nsSNPs and their effect on protein was predicted by PolyPhen and SIFT respectively. FASTSNP was used for functional analysis and estimation of risk score. The functional impact on the EPHA2 protein was evaluated by using SWISSPDB viewer and NOMAD-Ref server. Our analysis revealed 16 SNPs as nonsynonymous out of which 6 nsSNPs, namely rs11543934, rs2291806, rs1058371, rs1058370, rs79100278 and rs113882203 were found to be least stable by I-Mutant 2.0 with DDG value of > -1.0. nsSNPs, namely rs35903225, rs2291806, rs1058372, rs1058370, rs79100278 and rs113882203 showed a highly deleterious tolerance index score of 0.00 by SIFT server. Four nsSNPs namely rs11543934, rs2291806, rs1058370 and rs113882203 were found to be probably damaging with PSIC score of ≥ 2. 0 by Polyp hen server. Three nsSNPs namely, rs11543934, rs2291806 and rs1058370 were found to be highly polymorphic with a risk score of 3-4 with a possible effect of Non-conservative change and splicing regulation by FASTSNP. The total energy and RMSD value was higher for the mutant-type structure compared to the native type structure. We concluded that the nsSNP namely rs2291806 as the potential functional polymorphic that is likely to have functional impact on the EPHA2 gene.  相似文献   

9.
In this work, we have analyzed the genetic variation that can alter the expression and the function in BRCA2 gene using computational methods. Out of the total 534 SNPs, 101 were found to be non synonymous (nsSNPs). Among the 7 SNPs in the untranslated region, 3 SNPs were found in 5′ and 4 SNPs were found in 3′ un-translated regions (UTR). Of the nsSNPs 20.7% were found to be damaging by both SIFT and PolyPhen server among the 101 nsSNPs investigated. UTR resource tool suggested that 2 SNPs in the 5′ UTR region and 4 SNPs in the 3′ UTR regions might change the protein expression levels. The mutation from asparagine to isoleucine at the position 3124 of the native protein of BRCA2 gene was most deleterious by both SIFT and PolyPhen servers. A structural analysis of this mutated protein and the native protein was made which had an RMSD value of 0.301 nm. Based on this work, we proposed that this most deleterious nsSNP with an SNPid rs28897759 is an important candidate for the cause of breast cancer by BRCA2 gene.  相似文献   

10.
Non-synonymous single nucleotide polymorphisms (nsSNPs) are considered as biomarkers to disease susceptibility. In the present study, nsSNPs in CLU, PICALM and BIN1 genes were screened for their functional impact on concerned proteins and their plausible role in Alzheimer disease (AD) susceptibility. Initially, SNPs were retrieved from dbSNP database, followed by identification of potentially deleterious nsSNPs and prediction of their effect on proteins by PolyPhen and SIFT. Protein stability and the probability of mutation occurrence were predicted using I-Mutant and PANTHER respectively. SNPs3D and FASTSNP were used for the functional analysis of nsSNPs. The functional impact on the 3D structure of proteins was evaluated by SWISSPDB viewer and NOMAD-Ref server. On analysis, 3 nsSNPs with IDs rs12800974 (T158P) of PICALM and rs11554585 (R397C) and rs11554585 (N106D) of BIN1 were predicted to be functionally significant with higher scores of I-Mutant, SIFT, PolyPhen, PANTHER, FASTSNP and SNPs3D. The mutant models of these nsSNPs also showed very high energies and RMSD values compared to their native structures. Current study proposes that the three nsSNPs identified in this study constitute a unique resource of potential genetic factors for AD susceptibility.  相似文献   

11.
In this study, we identified the most deleterious nsSNP in RB1 gene through structural and functional properties of its protein (pRB) and investigated its binding affinity with E2F-2. Out of 956 SNPs, we investigated 12 nsSNPs in coding region in which three of them (SNPids rs3092895, rs3092903 and rs3092905) are commonly found to be damaged by I-Mutant 2.0, SIFT and PolyPhen programs. With this effort, we modeled the mutant pRB proteins based on these deleterious nsSNPs. From a comparison of total energy, stabilizing residues and RMSD of these three mutant proteins with native pRB protein, we identified that the major mutation is from Glutamic acid to Glycine at the residue position of 746 of pRB. Further, we compared the binding efficiency of both native and mutant pRB (E746G) with E2F-2. We found that mutant pRB has less binding affinity with E2F-2 as compared to native type. This is due to sixteen hydrogen bonding and two salt bridges that exist between native type and E2F-2, whereas mutant type makes only thirteen hydrogen bonds and one salt bridge with E2F-2. Based on our investigation, we propose that the SNP with an id rs3092905 could be the most deleterious nsSNP in RB1 gene causing retinoblastoma.  相似文献   

12.
Discovery of non-synonymous single nucleotide polymorphisms (nsSNP), which cause amino acid substitutions, is important because they are more likely to alter protein function than synonymous SNPs (sSNP) or those SNPs that do not result in amino acid changes. By changing the coding sequences, nsSNP may play a role in heritable differences between individual organisms. In the chicken and many other vertebrates, the main obstacle for identifying nsSNP is that there is insufficient protein and mRNA sequence information for self-species referencing and thus, determination of the correct reading frame for expressed sequence tags (ESTs) is difficult. Therefore, in order to estimate the correct reading frame at nsSNP in chicken ESTs, a double-screening approach was designed using self- or cross-species protein referencing, in addition to the ESTScan coding region estimation programme. Starting with 23 427 chicken ESTs, 1210 potential SNPs were discovered using a phred/phrap/polyphred/consed pipeline process and among these, 108 candidate nsSNP were identified with the double screening method. A searchable SNP database (chicksnps) for the candidate chicken SNPs, including both nsSNPs and sSNPs is available at http://chicksnps.afs.udel.edu. The chicken SNP data described in this paper have been submitted to the data base SNP under National Center for Biotechnology Information assay ID ss4387050-ss4388259.  相似文献   

13.
In this study,we identified the most deleterious nsSNP in RB1 gene through structural and functional properties of its protein (pRB) and investigated its binding affinity with E2F-2.Out of 956 SNPs,we investigated 12 nsSNPs in coding region in which three of them (SNPids rs3092895,rs3092903 and rs3092905) are commonly found to be damaged by I-Mutant 2.0,SIFT and PolyPhen programs.With this effort,we modeled the mutant pRB proteins based on these deleterious nsSNPs.From a comparison of total energy,stabilizing residues and RMSD of these three mutant proteins with native pRB protein,we identified that the major mutation is from Glutamic acid to Glycine at the residue position of 746 of pRB.Further,we compared the binding efficiency of both native and mutant pRB (E746G) with E2F-2.We found that mutant pRB has less binding affinity with E2F-2 as compared to native type.This is due to sixteen hydrogen bonding and two salt bridges that exist between native type and E2F-2,whereas mutant type makes only thirteen hydrogen bonds and one salt bridge with E2F-2.Based on our investigation,we propose that the SNP with an id rs3092905 could be the most deleterious nsSNP in RB1 gene causing retinoblastoma.  相似文献   

14.
In this work we have analyzed the genetic variation that can alter the expression and the function of the VHL gene using computational methods. Of 110 single nucleotide polymorphisms (SNPs), 33 were found to be nonsynonymous (nsSNPs) and 23 SNPs were found in untranslated regions. Of the 33 nsSNPs investigated, 36.3% were found to be deleterious by both SIFT and PolyPhen servers. An untranslated region (UTR) resource tool suggested that two SNPs in the 5' UTR region and six SNPs in the 3' UTR region might change the protein expression levels. It was found by both SIFT and PolyPhen servers that a mutation from histidine to arginine at position 115 of the native protein of the VHL gene was most deleterious. A structural analysis of this mutated protein and the native protein was performed and had a root mean square deviation (RMSD) of 2.78 A. Based on this work, we propose that the nsSNP with a SNPid of rs5030812 is an important candidate for the cause of von Hippel-Lindau syndrome via the VHL gene.  相似文献   

15.
Non-synonymous single nucleotide polymorphisms (nsSNPs) are known to alter protein function, contributing to disease susceptibility. This report explores the nature of nsSNPs in the gene products of the highly conserved mitogen-activated protein kinase (MAPK) signaling pathways already implicated in cancer development. MAPK signaling pathways regulate cellular processes such as proliferation, differentiation, apoptosis, and survival mediated through interconnected signaling cascades. Using the dbSNP database, we have identified 25 nsSNPs in 17 out of 98 MAPK genes studied. Computational algorithms were used to predict whether the amino acid substitutions were evolutionarily tolerated, or affected putative functional units such as phosphorylation sites, protein motifs and domains. This study predicts that 36% of nsSNPs are likely to have functional consequences, based on evolutionary conservation analysis, and 36% based on phosphorylation prediction analysis. All such nsSNPs represent potentially functional and disease-causing/modifying alleles. More interestingly, the epistatic relationships discussed in this report represent potential synergistic/ antagonistic/additive effects of nsSNP combinations found within the same protein, or within members of the same protein complex and cascades. This strategy can effectively determine which nsSNPs potentially alter protein function, and can be utilized to study the genetic architecture and disease association of other biological protein complexes and networks.  相似文献   

16.
Many existing databases annotate experimentally characterized single nucleotide polymorphisms (SNPs). Each non-synonymous SNP (nsSNP) changes one amino acid in the gene product (single amino acid substitution;SAAS). This change can either affect protein function or be neutral in that respect. Most polymorphisms lack experimental annotation of their functional impact. Here, we introduce SNPdbe-SNP database of effects, with predictions of computationally annotated functional impacts of SNPs. Database entries represent nsSNPs in dbSNP and 1000 Genomes collection, as well as variants from UniProt and PMD. SAASs come from >2600 organisms; 'human' being the most prevalent. The impact of each SAAS on protein function is predicted using the SNAP and SIFT algorithms and augmented with experimentally derived function/structure information and disease associations from PMD, OMIM and UniProt. SNPdbe is consistently updated and easily augmented with new sources of information. The database is available as an MySQL dump and via a web front end that allows searches with any combination of organism names, sequences and mutation IDs. AVAILABILITY: http://www.rostlab.org/services/snpdbe.  相似文献   

17.
Computational prediction of disease-associated non-synonymous polymorphism (nsSNP) has provided a significant platform to filter out the pathological mutations from large pool of SNP datasets at a very low cost input. Several methodologies and complementary protocols have been previously implemented and has provided significant prediction results. Although the previously implicated prediction methods were capable of investigating the most likely deleterious nsSNPs, but due to the lack of genotype–phenotype association analysis, the prediction results lacked in accuracy level. In this work we implemented the computational compilation of protein conformational changes as well as the probable disease-associated phenotypic outcomes. Our result suggested E403K mutation in mitotic centromere-associated kinesin protein as highly damaging and showed strong concordance to the previously observed colorectal cancer mutations aggregation tendency and energy value changes. Moreover, the molecular dynamics simulation results showed major loss in conformation and stability of mutant N-terminal kinesin-like domain structure. The result obtained in this study will provide future prospect of computational approaches in determining the SNPs that may affect the native conformation of protein structure and lead to cancer-associated disorders.  相似文献   

18.
In the present study, nsSNPs in EPHX1, GSTT1, GSTM1 and GSTP1 genes were screened for their functional impact on concerned proteins and their plausible role in breast cancer susceptibility. Initially, SNPs were retrieved from dbSNP, followed by identification of potentially deleterious nsSNPs using PolyPhen and SIFT. Functional analysis was done with SNPs3D, SNPs&GO and MutPred methods. Prediction and evaluation of the functional impact on the 3D structure of proteins were performed with Swiss PDB viewer and NOMAD-Ref servers. On analysis, 13 nsSNPs were found to be highly deleterious and damaging to the protein structure, of which 6 nsSNPs, rs45549733, rs45506591 and rs4986949 of GSTP1, rs72549341 and rs148240980 of EPHX1 and rs17856199 of GSTT1 were predicted to be potentially polymorphic. It is therefore hypothesized that the 6 identified nsSNPs may alter the detoxification process and elevate carcinogenic metabolite accumulation thus modifies the risk of breast cancer susceptibility in a group of women.  相似文献   

19.
Partner and Localizer of BRCA2 or PALB2 is a typical tumor suppressor protein, that responds to DNA double stranded breaks through homologous recombination repair. Heterozygous mutations in PALB2 are known to contribute to the susceptibility of breast and ovarian cancer. However, there is no comprehensive study characterizing the structural and functional impacts of SNPs located in the PALB2 gene. Therefore, it is of interest to document a comprehensive analysis of coding and non-coding SNPs located at the PALB2 loci using in silico tools. The data for 1455 non-synonymous SNPs (nsSNPs) located in the PALB2 loci were retrieved from the dbSNP database. Comprehensive characterization of the SNPs using a combination of in silico tools such as SIFT, PROVEAN, PolyPhen, PANTHER, PhD-SNP, Pmut, MutPred 2.0 and SNAP-2, identified 28 functionally important SNPs. Among these, 16 nsSNPs were further selected for structural analysis using conservation profile and protein stability. The most deleterious nsSNPs were documented within the WD40 domain of PALB2. A general outline of the structural consequences of each variant was developed using the HOPE project data. These 16 mutant structures were further modelled using SWISS Model and three most damaging mutant models (rs78179744, rs180177123 and rs45525135) were identified. The non-coding SNPs in the 3'' UTR region of the PALB2 gene were analyzed for altered miRNA target sites. The comprehensive characterization of the coding and non-coding SNPs in the PALB2 locus has provided a list of damaging SNPs with potential disease association. Further validation through genetic association study will reveal their clinical significance.  相似文献   

20.
单核苷酸多态性(single nucleotide polymorphism,SNPs),即在基因组水平上由单个核苷酸的变异而引起的DNA序列多态性变化,具体是指在DNA序列中的单个碱基的变异,其是人类基因组变异种最常见的一种。SNP研究最主要的目的就是对人类表型变异遗传学的理解,尤其是关于人类遗传疾病的研究。而非同义单核苷酸多态性(nsSNPs)是SNPs中的一种,主要是指处于编码区会引起翻译后对应氨基酸序列变化的单核苷酸突变。因为nsSNPs可能会对蛋白质的功能造成影响,被认为是造成人类遗传病的主要原因。因此将与疾病相关的nsSNPs从中性的nsSNPs中区分出来是很重要的。本文根据国内外与疾病相关nsSNPs预测的研究,分析了预测中所涉及到的特征属性,总结了对这些特征进行优化的特征选择方法,并概述了在预测过程中使用的各种分类器。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号