首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Human genetic variations primarily result from single nucleotide polymorphisms (SNPs) that occurs approximately every 1000 bases in the overall human population. The non-synonymous SNPs (nsSNPs), lead to amino acid changes in the protein product may account for nearly half of the known genetic variations linked to inherited human diseases and cancer. One of the main problems of medical genetics today is to identify nsSNPs that underlie disease-related phenotypes in humans. An attempt was made to develop a new approach to predict such nsSNPs. This would enhance our understanding of genetic diseases and helps to predict the disease. We detect nsSNPs and all possible and reliable alleles by ANN, a soft computing model using potential SNP information. Reliable nsSNPs are identified, based on the reconstructed alleles and on sequence redundancy. The model gives good results with mean specificity (95.85&), sensitivity (97.40&) and accuracy (96.25&). Our results indicate that ANNs can serve as a useful method to analyze quantitative effect of nsSNPs on protein function and would be useful for large-scale analysis of genomic nsSNP data. AVAILABILITY: The database is available for free at http://www.snp.mirworks.in.  相似文献   

2.
单核苷酸多态性(single nucleotide polymorphism,SNPs),即在基因组水平上由单个核苷酸的变异而引起的DNA序列多态性变化,具体是指在DNA序列中的单个碱基的变异,其是人类基因组变异种最常见的一种。SNP研究最主要的目的就是对人类表型变异遗传学的理解,尤其是关于人类遗传疾病的研究。而非同义单核苷酸多态性(nsSNPs)是SNPs中的一种,主要是指处于编码区会引起翻译后对应氨基酸序列变化的单核苷酸突变。因为nsSNPs可能会对蛋白质的功能造成影响,被认为是造成人类遗传病的主要原因。因此将与疾病相关的nsSNPs从中性的nsSNPs中区分出来是很重要的。本文根据国内外与疾病相关nsSNPs预测的研究,分析了预测中所涉及到的特征属性,总结了对这些特征进行优化的特征选择方法,并概述了在预测过程中使用的各种分类器。  相似文献   

3.

Background  

Human genetic variations primarily result from single nucleotide polymorphisms (SNPs) that occur approximately every 1000 bases in the overall human population. The non-synonymous SNPs (nsSNPs) that lead to amino acid changes in the protein product may account for nearly half of the known genetic variations linked to inherited human diseases. One of the key problems of medical genetics today is to identify nsSNPs that underlie disease-related phenotypes in humans. As such, the development of computational tools that can identify such nsSNPs would enhance our understanding of genetic diseases and help predict the disease.  相似文献   

4.
MOTIVATION: The NCBI dbSNP database lists over 9 million single nucleotide polymorphisms (SNPs) in the human genome, but currently contains limited annotation information. SNPs that result in amino acid residue changes (nsSNPs) are of critical importance in variation between individuals, including disease and drug sensitivity. RESULTS: We have developed LS-SNP, a genomic scale software pipeline to annotate nsSNPs. LS-SNP comprehensively maps nsSNPs onto protein sequences, functional pathways and comparative protein structure models, and predicts positions where nsSNPs destabilize proteins, interfere with the formation of domain-domain interfaces, have an effect on protein-ligand binding or severely impact human health. It currently annotates 28,043 validated SNPs that produce amino acid residue substitutions in human proteins from the SwissProt/TrEMBL database. Annotations can be viewed via a web interface either in the context of a genomic region or by selecting sets of SNPs, genes, proteins or pathways. These results are useful for identifying candidate functional SNPs within a gene, haplotype or pathway and in probing molecular mechanisms responsible for functional impacts of nsSNPs. AVAILABILITY: http://www.salilab.org/LS-SNP CONTACT: rachelk@salilab.org SUPPLEMENTARY INFORMATION: http://salilab.org/LS-SNP/supp-info.pdf.  相似文献   

5.
Shen J  Deininger PL  Zhao H 《Cytokine》2006,35(1-2):62-66
Understanding the functions of single nucleotide polymorphisms (SNPs) can greatly help to understand the genetics of the human phenotype variation and especially the genetic basis of human complex diseases. However, how to identify functional SNPs from a pool containing both functional and neutral SNPs is challenging. In this study, we analyzed the genetic variations that can alter the expression and function of a group of cytokine proteins using computational tools. As a result, we extracted 4552 SNPs from 45 cytokine proteins from SNPper database. Of particular interest, 828 SNPs were in the 5'UTR region, 961 SNPs were in the 3' UTR region, and 85 SNPs were non-synonymous SNPs (nsSNPs), which cause amino acid change. Evolutionary conservation analysis using the SIFT tool suggested that 8 nsSNPs may disrupt the protein function. Protein structure analysis using the PolyPhen tool suggested that 5 nsSNPs might alter protein structure. Binding motif analysis using the UTResource tool suggested that 27 SNPs in 5' or 3'UTR might change protein expression levels. Our study demonstrates the presence of naturally occurring genetic variations in the cytokine proteins that may affect their expressions and functions with possible roles in complex human disease, such as immune diseases.  相似文献   

6.
Certain genetic variations in the human population are associated with heritable diseases, and single nucleotide polymorphisms (SNPs) represent the most common form of such differences in DNA sequence. In particular, substantial interest exists in determining whether a non-synonymous SNP (nsSNP), leading to a single residue replacement in the translated protein product, is neutral or disease-related. The nature of protein structure-function relationships suggests that nsSNP effects, either benign or leading to aberrant protein function possibly associated with disease, are dependent on relative structural changes introduced upon mutation. In this study, we characterize a representative sampling of 1790 documented neutral and disease-related human nsSNPs mapped to 243 diverse human protein structures, by quantifying environmental perturbations in the associated proteins with the use of a computational mutagenesis methodology that relies on a four-body, knowledge-based, statistical contact potential. These structural change data are used as attributes to generate a vector representation for each nsSNP, in combination with additional features reflecting sequence and structure of the corresponding protein. A trained model based on the random forest supervised classification algorithm achieves 76% cross-validation accuracy. Our classifier performs at least as well as other methods that use significantly larger datasets of nsSNPs for model training, and the novelty of our attributes differentiates the model as an orthogonal approach that can be utilized in conjunction with other techniques. A dedicated server for obtaining predictions, as well as supporting datasets and documentation, is available at http://proteins.gmu.edu/automute.  相似文献   

7.
MOTIVATION: There has been great expectation that the knowledge of an individual's genotype will provide a basis for assessing susceptibility to diseases and designing individualized therapy. Non-synonymous single nucleotide polymorphisms (nsSNPs) that lead to an amino acid change in the protein product are of particular interest because they account for nearly half of the known genetic variations related to human inherited diseases. To facilitate the identification of disease-associated nsSNPs from a large number of neutral nsSNPs, it is important to develop computational tools to predict the phenotypic effects of nsSNPs. RESULTS: We prepared a training set based on the variant phenotypic annotation of the Swiss-Prot database and focused our analysis on nsSNPs having homologous 3D structures. Structural environment parameters derived from the 3D homologous structure as well as evolutionary information derived from the multiple sequence alignment were used as predictors. Two machine learning methods, support vector machine and random forest, were trained and evaluated. We compared the performance of our method with that of the SIFT algorithm, which is one of the best predictive methods to date. An unbiased evaluation study shows that for nsSNPs with sufficient evolutionary information (with not <10 homologous sequences), the performance of our method is comparable with the SIFT algorithm, while for nsSNPs with insufficient evolutionary information (<10 homologous sequences), our method outperforms the SIFT algorithm significantly. These findings indicate that incorporating structural information is critical to achieving good prediction accuracy when sufficient evolutionary information is not available. AVAILABILITY: The codes and curated dataset are available at http://compbio.utmem.edu/snp/dataset/  相似文献   

8.
Recent progress in identification and mapping of single nucleotide polymorphisms (SNPs) in the human genome generates an unprecedented opportunity to explore cause-effect relationships between genetic variations and susceptibility to common diseases. For this purpose, one promising strategy would be to select a set of SNPs that potentially alter the function of proteins involved in the pathogenesis of the diseases and compare their frequencies in the affected individuals and the healthy population. In this respect, SNPs that change amino acid sequences (nonsynonymous SNPs; nsSNPs) are of particular interest, since they are more likely to affect protein functions. In this study, we have constructed a catalog of nsSNPs (PicSNP), whose unique features are (i) nsSNPs are classified according to the functions of the affected genes and are searchable under the guidance of hierarchical lists of protein functions and (ii) nsSNPs that lead to amino acid changes in the known functional sites and domains of proteins are highlighted. Out of 1,190,295 SNPs extracted from public database, we identified 3793 nsSNPs and classified them in 1247 categories of protein functions. 495 sites and domains annotated in the Swiss-Prot database were found to include nsSNPs, including 2 nsSNPs in disulfide-binding sites and 38 nsSNPs in transmembrane regions. PicSNP is available via the World Wide Web (http://picsnp.org) and would support research questing for SNPs involved in common diseases.  相似文献   

9.
MOTIVATION: Contemporary, high-throughput sequencing efforts have identified a rich source of naturally occurring single nucleotide polymorphisms (SNPs), a subset of which occur in the coding region of genes and result in a change in the encoded amino acid sequence (non-synonymous coding SNPs or 'nsSNPs'). It is hypothesized that a subset of these nsSNPs may underlie common human disease. Testing all these polymorphisms for disease association would be time consuming and expensive. Thus, computational methods have been developed to both prioritize candidate nsSNPs and make sense of their likely molecular physiologic impact. RESULTS: We have developed a method to prioritize nsSNPs and have applied it to the human protein kinase gene family. The results of our analyses provide high quality predictions and outperform available whole genome prediction methods (74% versus 83% prediction accuracy). Our analyses and methods consider both DNA sequence conservation, which most traditional methods are based on, as well unique structural and functional features of kinases. We provide a ranked list of common kinase nsSNPs that have a higher probability of impacting human disease based on our analyses.  相似文献   

10.
Human non-synonymous SNPs: server and survey   总被引:37,自引:0,他引:37       下载免费PDF全文
  相似文献   

11.

Background  

There has been an explosion in the number of single nucleotide polymorphisms (SNPs) within public databases. In this study we focused on non-synonymous protein coding single nucleotide polymorphisms (nsSNPs), some associated with disease and others which are thought to be neutral. We describe the distribution of both types of nsSNPs using structural and sequence based features and assess the relative value of these attributes as predictors of function using machine learning methods. We also address the common problem of balance within machine learning methods and show the effect of imbalance on nsSNP function prediction. We show that nsSNP function prediction can be significantly improved by 100% undersampling of the majority class. The learnt rules were then applied to make predictions of function on all nsSNPs within Ensembl.  相似文献   

12.
The prediction of the effects of nonsynonymous single nucleotide polymorphisms (nsSNPs) on function depends critically on exploiting all information available on the three-dimensional structures of proteins. We describe software and databases for the analysis of nsSNPs that allow a user to move from SNP to sequence to structure to function. In both structure prediction and the analysis of the effects of nsSNPs, we exploit information about protein evolution, in particular, that derived from investigations on the relation of sequence to structure gained from the study of amino acid substitutions in divergent evolution. The techniques developed in our laboratory have allowed fast and automated sequence-structure homology recognition to identify templates and to perform comparative modeling; as well as simple, robust, and generally applicable algorithms to assess the likely impact of amino acid substitutions on structure and interactions. We describe our strategy for approaching the relationship between SNPs and disease, and the results of benchmarking our approach -- human proteins of known structure and recognized mutation.  相似文献   

13.
Partner and Localizer of BRCA2 or PALB2 is a typical tumor suppressor protein, that responds to DNA double stranded breaks through homologous recombination repair. Heterozygous mutations in PALB2 are known to contribute to the susceptibility of breast and ovarian cancer. However, there is no comprehensive study characterizing the structural and functional impacts of SNPs located in the PALB2 gene. Therefore, it is of interest to document a comprehensive analysis of coding and non-coding SNPs located at the PALB2 loci using in silico tools. The data for 1455 non-synonymous SNPs (nsSNPs) located in the PALB2 loci were retrieved from the dbSNP database. Comprehensive characterization of the SNPs using a combination of in silico tools such as SIFT, PROVEAN, PolyPhen, PANTHER, PhD-SNP, Pmut, MutPred 2.0 and SNAP-2, identified 28 functionally important SNPs. Among these, 16 nsSNPs were further selected for structural analysis using conservation profile and protein stability. The most deleterious nsSNPs were documented within the WD40 domain of PALB2. A general outline of the structural consequences of each variant was developed using the HOPE project data. These 16 mutant structures were further modelled using SWISS Model and three most damaging mutant models (rs78179744, rs180177123 and rs45525135) were identified. The non-coding SNPs in the 3'' UTR region of the PALB2 gene were analyzed for altered miRNA target sites. The comprehensive characterization of the coding and non-coding SNPs in the PALB2 locus has provided a list of damaging SNPs with potential disease association. Further validation through genetic association study will reveal their clinical significance.  相似文献   

14.
Single-nucleotide polymorphisms (SNPs) are the most frequent form of genetic variations. Non-synonymous SNPs (nsSNPs) occurring in coding region result in single amino acid substitutions that associate with human hereditary diseases. Plenty of approaches were designed for distinguishing deleterious from neutral nsSNPs based on sequence level information. Novel in this work, combinations of protein–protein interaction (PPI) network topological features were introduced in predicting disease-related nsSNPs. Based on a dataset that was compiled from Swiss-Prot, a random forest model was constructed with an average accuracy value of 80.43 % and an MCC value of 0.60 in a rigorous tenfold crossvalidation test. For an independent dataset, our model achieved an accuracy of 88.05 % and an MCC of 0.67. Compared with previous studies, our approach presented superior prediction ability. Results showed that the incorporated PPI network topological features outperform conventional features. Our further analysis indicated that disease-related proteins are topologically different from other proteins. This study suggested that nsSNPs may share some topological information of proteins and the change of topological attributes could provide clues in illustrating functional shift due to nsSNPs.  相似文献   

15.
Savas S  Ahmad MF  Shariff M  Kim DY  Ozcelik H 《Proteins》2005,58(3):697-705
Nonsynonymous single nucleotide polymorphisms (nsSNPs) alter the encoded amino acid sequence, and are thus likely to affect the function of the proteins, and represent potential disease-modifiers. There is an enormous number of nsSNPs in the human population, and the major challenge lies in distinguishing the functionally significant and potentially disease-related ones from the rest. In this study, we analyzed the genetic variations that can alter the functions and the interactions of a group of cell cycle proteins (n = 60) and the proteins interacting with them (n = 26) using computational tools. As a result, we extracted 249 nsSNPs from 77 cell cycle proteins and their interaction partners from public SNP databases. Only 31 (12.4%) of the nsSNPs were validated. The majority (64.5%) of the validated SNPs were rare (minor allele frequencies < 5%). Evolutionary conservation analysis using the SIFT tool suggested that 16.1% of the validated nsSNPs may disrupt the protein function. In addition, 58% of the validated nsSNPs were located in functional protein domains/motifs, which together with the evolutionary conservation analysis enabled us to infer possible biological consequences of the nsSNPs in our set. Our study strongly suggests the presence of naturally occurring genetic variations in the cell cycle proteins that may affect their interactions and functions with possible roles in complex human diseases, such as cancer.  相似文献   

16.
Barenboim M  Masso M  Vaisman II  Jamison DC 《Proteins》2008,71(4):1930-1939
There is substantial interest in methods designed to predict the effect of nonsynonymous single nucleotide polymorphisms (nsSNPs) on protein function, given their potential relationship to heritable diseases. Current state-of-the-art supervised machine learning algorithms, such as random forest (RF), train models that classify single amino acid mutations in proteins as either neutral or deleterious to function. However, it is frequently the case that the functional effect of a polymorphism on a protein resides between these two extremes. The utilization of classifiers that incorporate fuzzy logic provides a natural extension in order to account for the spectrum of possible functional consequences. We generated a dataset of single amino acid substitutions in human proteins having known three-dimensional structures. Each variant was uniquely represented as a feature vector that included computational geometry and knowledge-based statistical potential predictors obtained though application of Delaunay tessellation of protein structures. Additional attributes consisted of physicochemical properties of the native and replacement amino acids as well as topological location of the mutated residue position in the solved structure. Classification performance of the RF algorithm was evaluated on a training set consisting of the disease-associated and neutral nsSNPs taken from our dataset, and attributes were ranked according to their relative importance. Similarly, we evaluated the performance of adaptive neuro-fuzzy inference system (ANFIS). The utility of statistical geometry predictors was compared with that of traditional structural and evolutionary attributes employed by other researchers, revealing an equally effective yet complementary methodology. Among all attributes in our feature set, the statistical geometry predictors were found to be the most highly ranked. On the basis of the AUC (area under the ROC curve) measure of performance, the ANFIS and RF models were equally effective when only statistical geometry features were utilized. Tenfold cross-validation studies evaluating AUC, balanced error rate (BER), and Matthew's correlation coefficient (MCC) showed that our RF model was at least comparable with the well-established methods of SIFT and PolyPhen. The trained RF and ANFIS models were each subsequently used to predict the disease potential of human nsSNPs in our dataset that are currently unclassified (http://rna.gmu.edu/FuzzySnps/).  相似文献   

17.
Recent analyses of human genome sequences have given rise to impressive advances in identifying non-synonymous single nucleotide polymorphisms (nsSNPs). By contrast, the annotation of nsSNPs and their links to diseases are progressing at a much slower pace. Many of the current approaches to analysing disease-associated nsSNPs use primarily sequence and evolutionary information, while structural information is relatively less exploited. In order to explore the potential of such information, we developed a structure-based approach, Bongo (Bonds ON Graph), to predict structural effects of nsSNPs. Bongo considers protein structures as residue-residue interaction networks and applies graph theoretical measures to identify the residues that are critical for maintaining structural stability by assessing the consequences on the interaction network of single point mutations. Our results show that Bongo is able to identify mutations that cause both local and global structural effects, with a remarkably low false positive rate. Application of the Bongo method to the prediction of 506 disease-associated nsSNPs resulted in a performance (positive predictive value, PPV, 78.5%) similar to that of PolyPhen (PPV, 77.2%) and PANTHER (PPV, 72.2%). As the Bongo method is solely structure-based, our results indicate that the structural changes resulting from nsSNPs are closely associated to their pathological consequences.  相似文献   

18.
MOTIVATION: Single nucleotide polymorphisms (SNPs) are the most common form of genetic variant in humans. SNPs causing amino acid substitutions are of particular interest as candidates for loci affecting susceptibility to complex diseases, such as diabetes and hypertension. To efficiently screen SNPs for disease association, it is important to distinguish neutral variants from deleterious ones. RESULTS: We describe the use of Pfam protein motif models and the HMMER program to predict whether amino acid changes in conserved domains are likely to affect protein function. We find that the magnitude of the change in the HMMER E-value caused by an amino acid substitution is a good predictor of whether it is deleterious. We provide internet-accessible display tools for a genomewide collection of SNPs, including 7391 distinct non-synonymous coding region SNPs in 2683 genes. AVAILABILITY: http://lpgws.nci.nih.gov/cgi-bin/GeneViewer.cgi  相似文献   

19.

Background

α-Thalassemia (α-thal) is a genetic disorder caused by the substitution of single amino acid or large deletions in the HBA1 and/or HBA2 genes.

Method

Using modern bioinformatics tools as a systematic in-silico approach to predict the deleterious SNPs in the HBA1 gene and its significant pathogenic impact on the functions and structure of HBA1 protein was predicted.

Results and Discussion

A total of 389 SNPs in HBA1 were retrieved from dbSNP database, which includes: 201 non-coding synonymous (nsSNPs), 43 human active SNPs, 16 intronic SNPs, 11 mRNA 3′ UTR SNPs, 9 coding synonymous SNPs, 9 5′ UTR SNPs and other types. Structural homology-based method (PolyPhen) and sequence homology-based tool (SIFT), SNPs&Go, PROVEAN and PANTHER revealed that 2.4% of the nsSNPs are pathogenic.

Conclusions

A total of 5 nsSNPs (G60V, K17M, K17T, L92F and W15R) were predicted to be responsible for the structural and functional modifications of HBA1 protein. It is evident from the deep comprehensive in-silico analysis that, two nsSNPs such as G60Vand W15R in HBA1 are highly deleterious. These “2 pathogenic nsSNPs” can be considered for wet-lab confirmatory analysis.  相似文献   

20.

Background

Recent reports suggest the role of nonsynonymous single nucleotide polymorphisms (nsSNPs) in cyclin-dependent kinase 7 (CDK7) gene associated with defect in the DNA repair mechanism that may contribute to cancer risk. Among the various inhibitors developed so far, flavopiridol proved to be a potential antitumor drug in the phase-III clinical trial for chronic lymphocytic leukemia. Here, we described a theoretical assessment for the discovery of new drugs or drug targets in CDK7 protein owing to the changes caused by deleterious nsSNPs.

Methods

Three nsSNPs (I63R, H135R, and T285M) were predicted to have functional impact on protein function by SIFT, PolyPhen2, I-Mutant3, PANTHER, SNPs&GO, PhD-SNP, and screening for non-acceptable polymorphisms (SNAP). Furthermore, we analyzed the native and proposed mutant models in atomic level 10 ns simulation using the molecular dynamics (MD) approach. Finally, with the aid of Autodock 4.0 and PatchDock, we analyzed the binding efficacy of flavopiridol with CDK7 protein with respect to the deleterious mutations.

Results

By comparing the results of all seven prediction tools, three nsSNPs (I63R, H135R, and T285M) were predicted to have functional impact on the protein function. The results of protein stability analysis inferred that I63R and H135R exhibited less deviation in root mean square deviation in comparison with the native and T285M protein. The flexibility of all the three mutant models of CDK7 protein is diverse in comparison with the native protein. Following to that, docking study revealed the change in the active site residues and decrease in the binding affinity of flavopiridol with mutant proteins.

Conclusion

This theoretical approach is entirely based on computational methods, which has the ability to identify the disease-related SNPs in complex disorders by contrasting their costs and capabilities with those of the experimental methods. The identification of disease related SNPs by computational methods has the potential to create personalized tools for the diagnosis, prognosis, and treatment of diseases.

Lay abstract

Cell cycle regulatory protein, CDK7, is linked with DNA repair mechanism which can contribute to cancer risk. The main aim of this study is to extrapolate the relationship between the nsSNPs and their effects in drug-binding capability. In this work, we propose a new methodology which (1) efficiently identified the deleterious nsSNPs that tend to have functional effect on protein function upon mutation by computational tools, (2) analyze d the native protein and proposed mutant models in atomic level using MD approach, and (3) investigated the protein-ligand interactions to analyze the binding ability by docking analysis. This theoretical approach is entirely based on computational methods, which has the ability to identify the disease-related SNPs in complex disorders by contrasting their costs and capabilities with those of the experimental methods. Overall, this approach has the potential to create personalized tools for the diagnosis, prognosis, and treatment of diseases.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号