首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Many non-synonymous single nucleotide polymorphisms (nsSNPs) in humans are suspected to impact protein function. Here, we present a publicly available server implementation of the method SNAP (screening for non-acceptable polymorphisms) that predicts the functional effects of single amino acid substitutions. SNAP identifies over 80% of the non-neutral mutations at 77% accuracy and over 76% of the neutral mutations at 80% accuracy at its default threshold. Each prediction is associated with a reliability index that correlates with accuracy and thereby enables experimentalists to zoom into the most promising predictions.  相似文献   

2.
Nonsynonymous single nucleotide polymorphisms (nsSNPs) in coding regions can lead to amino acid changes that might alter the protein’s function and account for susceptibility to disease and altered drug/xenobiotic response. Many nsSNPs have been found in genes encoding human phase II metabolizing enzymes; however, there is little known about the relationship between the genotype and phenotype of nsSNPs in these enzymes. We have identified 923 validated nsSNPs in 104 human phase II enzyme genes from the Ensembl genome database and the NCBI SNP database. Using PolyPhen, Panther, and SNAP algorithms, 44%–59% of nsSNPs in phase II enzyme genes were predicted to have functional impacts on protein function. Predictions largely agree with the available experimental annotations. 68% of deleterious nsSNPs were correctly predicted as damaging. This study also identified many amino acids that are likely to be functionally critical, but have not yet been studied experimentally. There was significant concordance between the predicted results of Panther and PolyPhen, and between SNAP non-neutral predictions and PolyPhen scores. Evolutionarily non-neutral (destabilizing) amino acid substitutions are thought to be the pathogenetic basis for the alteration of phase II enzyme activity and to be associated with disease susceptibility and drug/xenobiotic toxicity. Furthermore, the molecular evolutionary patterns of phase II enzymes were characterized with regards to the predicted deleterious nsSNPs.  相似文献   

3.
While genome-era technologies focused on complete genome sequencing in various organisms, post-genome technologies aim at the understanding of the mechanisms of genetic information processing and elucidation of within-species variation. Single nucleotide polymorphisms (SNPs) are the most common source of genome variation in the human population. Nonsynonymous SNPs that occur in coding gene regions and result in amino acid substitutions are of particular interest. It is thought that such SNPs are responsible for phenotypic variation, quantitative traits, and the etiology of common diseases. PolyPhen is a computational tool for the prediction of putatively functional nonsynonymous SNPs by combining information of various types. The application areas of PolyPhen and similar methods include the genetics of complex diseases and congenital defects, the identification of functional mutations in model organisms, and evolutionary genetics.  相似文献   

4.
Single-nucleotide polymorphisms (SNPs) are the most frequent form of genetic variations. Non-synonymous SNPs (nsSNPs) occurring in coding region result in single amino acid substitutions that associate with human hereditary diseases. Plenty of approaches were designed for distinguishing deleterious from neutral nsSNPs based on sequence level information. Novel in this work, combinations of protein–protein interaction (PPI) network topological features were introduced in predicting disease-related nsSNPs. Based on a dataset that was compiled from Swiss-Prot, a random forest model was constructed with an average accuracy value of 80.43 % and an MCC value of 0.60 in a rigorous tenfold crossvalidation test. For an independent dataset, our model achieved an accuracy of 88.05 % and an MCC of 0.67. Compared with previous studies, our approach presented superior prediction ability. Results showed that the incorporated PPI network topological features outperform conventional features. Our further analysis indicated that disease-related proteins are topologically different from other proteins. This study suggested that nsSNPs may share some topological information of proteins and the change of topological attributes could provide clues in illustrating functional shift due to nsSNPs.  相似文献   

5.
Prediction of protein stability upon amino acid substitutions is an important problem in molecular biology and it will be helpful for designing stable mutants. In this work, we have analyzed the stability of protein mutants using three different data sets of 1791, 1396, and 2204 mutants, respectively, for thermal stability (DeltaTm), free energy change due to thermal (DeltaDeltaG), and denaturant denaturations (DeltaDeltaGH2O), obtained from the ProTherm database. We have classified the mutants into 380 possible substitutions and assigned the stability of each mutant using the information obtained with similar type of mutations. We observed that this assignment could distinguish the stabilizing and destabilizing mutants to an accuracy of 70-80% at different measures of stability. Further, we have classified the mutants based on secondary structure and solvent accessibility (ASA) and observed that the classification significantly improved the accuracy of prediction. The classification of mutants based on helix, strand, and coil distinguished the stabilizing/destabilizing mutants at an average accuracy of 82% and the correlation is 0.56; information about the location of residues at the interior, partially buried, and surface regions of a protein correctly identified the stabilizing/destabilizing residues at an average accuracy of 81% and the correlation is 0.59. The nine subclassifications based on three secondary structures and solvent accessibilities improved the accuracy of assigning stabilizing/destabilizing mutants to an accuracy of 84-89% for the three data sets. Further, the present method is able to predict the free energy change (DeltaDeltaG) upon mutations within a deviation of 0.64 kcal/mol. We suggest that this method could be used for predicting the stability of protein mutants.  相似文献   

6.
MOTIVATION: Contemporary, high-throughput sequencing efforts have identified a rich source of naturally occurring single nucleotide polymorphisms (SNPs), a subset of which occur in the coding region of genes and result in a change in the encoded amino acid sequence (non-synonymous coding SNPs or 'nsSNPs'). It is hypothesized that a subset of these nsSNPs may underlie common human disease. Testing all these polymorphisms for disease association would be time consuming and expensive. Thus, computational methods have been developed to both prioritize candidate nsSNPs and make sense of their likely molecular physiologic impact. RESULTS: We have developed a method to prioritize nsSNPs and have applied it to the human protein kinase gene family. The results of our analyses provide high quality predictions and outperform available whole genome prediction methods (74% versus 83% prediction accuracy). Our analyses and methods consider both DNA sequence conservation, which most traditional methods are based on, as well unique structural and functional features of kinases. We provide a ranked list of common kinase nsSNPs that have a higher probability of impacting human disease based on our analyses.  相似文献   

7.
Both effective population size and life history may influence the efficacy of purifying selection, but it remains unclear if the environment affects the accumulation of weakly deleterious nonsynonymous polymorphisms. We hypothesize that the reduced energetic cost of osmoregulation in brackish water habitat may cause relaxation of selective constraints at mitochondrial oxidative phosphorylation (OXPHOS) genes. To test this hypothesis, we analyzed 57 complete mitochondrial genomes of Pungitius pungitius collected from brackish and freshwater habitats. Based on inter‐ and intraspecific comparisons, we estimated that 84% and 68% of the nonsynonymous polymorphisms in the freshwater and brackish water populations, respectively, are weakly or moderately deleterious. Using in silico prediction tools (MutPred, SNAP2), we subsequently identified nonsynonymous polymorphisms with potentially harmful effect. Both prediction methods indicated that the functional effects of the fixed nonsynonymous substitutions between nine‐ and three‐spined stickleback were weaker than for polymorphisms within species, indicating that harmful nonsynonymous polymorphisms within populations rarely become fixed between species. No significant differences in mean estimated functional effects were identified between freshwater and brackish water nine‐spined stickleback to support the hypothesis that reduced osmoregulatory energy demand in the brackish water environment reduces the strength of purifying selection at OXPHOS genes. Instead, elevated frequency of nonsynonymous polymorphisms in the freshwater environment (Pn/Ps = 0.549 vs. 0.283; Fisher's exact test p = .032) suggested that purifying selection is less efficient in small freshwater populations. This study shows the utility of in silico functional prediction tools in population genetic and evolutionary research in a nonmammalian vertebrate and demonstrates that mitochondrial energy production genes represent a promising system to characterize the demographic, life history and potential habitat‐dependent effects of segregating amino acid variants.  相似文献   

8.
We developed a modified allele-specific PCR procedure for assaying single nucleotide polymorphisms (SNPs) and used the procedure (called SNAP for single-nucleotide amplified polymorphisms) to generate 62 Arabidopsis mapping markers. SNAP primers contain a single base pair mismatch within three nucleotides from the 3' end of one allele (the specific allele) and in addition have a 3' mismatch with the nonspecific allele. A computer program called SNAPER was used to facilitate the design of primers that generate at least a 1,000-fold difference in the quantity of the amplification products from the specific and nonspecific SNP alleles. Because SNAP markers can be readily assayed by electrophoresis on standard agarose gels and because a public database of over 25,000 SNPs is available between the Arabidopsis Columbia and Landsberg erecta ecotypes, the SNAP method greatly facilitates the map-based cloning of Arabidopsis genes defined by a mutant phenotype.  相似文献   

9.
Aberrant or modified splicing patterns of genes are causative for many human diseases. Therefore, the identification of genetic variations that cause changes in the splicing pattern of a gene is important. Elsewhere, we described the widespread occurrence of alternative splicing at NAGNAG acceptors. Here, we report a genomewide screen for single-nucleotide polymorphisms (SNPs) that affect such tandem acceptors. From 121 SNPs identified, we extracted 64 SNPs that most likely affect alternative NAGNAG splicing. We demonstrate that the NAGNAG motif is necessary and sufficient for this type of alternative splicing. The evolutionarily young NAGNAG alleles, as determined by the comparison with the chimpanzee genome, exhibit the same biases toward intron phase 1 and single-amino acid insertion/deletions that were already observed for all human NAGNAG acceptors. Since 28% of the NAGNAG SNPs occur in known disease genes, they represent preferable candidates for a more-detailed functional analysis, especially since the splice relevance for some of the coding SNPs is overlooked. Against the background of a general lack of methods for identifying splice-relevant SNPs, the presented approach is highly effective in the prediction of polymorphisms that are causal for variations in alternative splicing.  相似文献   

10.
Bioinformatics tools have gained popularity in biology but little is known about their validity. We aimed to assess the early contribution of 415 single nucleotide polymorphisms (SNPs) associated with eight cardio-metabolic traits at the genome-wide significance level in adults in the Family Atherosclerosis Monitoring In earLY Life (FAMILY) birth cohort. We used the popular web-based tool SNAP to assess the availability of the 415 SNPs in the Illumina Cardio-Metabochip genotyped in the FAMILY study participants. We then compared the SNAP output with the Cardio-Metabochip file provided by Illumina using chromosome and chromosomal positions of SNPs from NCBI Human Genome Browser (Genome Reference Consortium Human Build 37). With the HapMap 3 release 2 reference, 201 out of 415 SNPs were reported as missing in the Cardio-Metabochip by the SNAP output. However, the Cardio-Metabochip file revealed that 152 of these 201 SNPs were in fact present in the Cardio-Metabochip array (false negative rate of 36.6%). With the more recent 1000 Genomes Project release, we found a false-negative rate of 17.6% by comparing the outputs of SNAP and the Illumina product file. We did not find any ‘false positive’ SNPs (SNPs specified as available in the Cardio-Metabochip by SNAP, but not by the Cardio-Metabochip Illumina file). The Cohen’s Kappa coefficient, which calculates the percentage of agreement between both methods, indicated that the validity of SNAP was fair to moderate depending on the reference used (the HapMap 3 or 1000 Genomes). In conclusion, we demonstrate that the SNAP outputs for the Cardio-Metabochip are invalid. This study illustrates the importance of systematically assessing the validity of bioinformatics tools in an independent manner. We propose a series of guidelines to improve practices in the fast-moving field of bioinformatics software implementation.  相似文献   

11.
单核苷酸多态性(single nucleotide polymorphism,SNPs),即在基因组水平上由单个核苷酸的变异而引起的DNA序列多态性变化,具体是指在DNA序列中的单个碱基的变异,其是人类基因组变异种最常见的一种。SNP研究最主要的目的就是对人类表型变异遗传学的理解,尤其是关于人类遗传疾病的研究。而非同义单核苷酸多态性(nsSNPs)是SNPs中的一种,主要是指处于编码区会引起翻译后对应氨基酸序列变化的单核苷酸突变。因为nsSNPs可能会对蛋白质的功能造成影响,被认为是造成人类遗传病的主要原因。因此将与疾病相关的nsSNPs从中性的nsSNPs中区分出来是很重要的。本文根据国内外与疾病相关nsSNPs预测的研究,分析了预测中所涉及到的特征属性,总结了对这些特征进行优化的特征选择方法,并概述了在预测过程中使用的各种分类器。  相似文献   

12.
Elucidating the relationship between polymorphic sequences and risk of common disease is a challenge. For example, although it is clear that variation in DNA repair genes is associated with familial cancer, aging and neurological disease, progress toward identifying polymorphisms associated with elevated risk of sporadic disease has been slow. This is partly due to the complexity of the genetic variation, the existence of large numbers of mostly low frequency variants and the contribution of many genes to variation in susceptibility. There has been limited development of methods to find associations between genotypes having many polymorphisms and pathway function or health outcome. We have explored several statistical methods for identifying polymorphisms associated with variation in DNA repair phenotypes. The model system used was 80 cell lines that had been resequenced to identify variation; 191 single nucleotide substitution polymorphisms (SNPs) are included, of which 172 are in 31 base excision repair pathway genes, 19 in 5 anti-oxidation genes, and DNA repair phenotypes based on single strand breaks measured by the alkaline Comet assay. Univariate analyses were of limited value in identifying SNPs associated with phenotype variation. Of the multivariable model selection methods tested: the easiest that provided reduced error of prediction of phenotype was simple counting of the variant alleles predicted to encode proteins with reduced activity, which led to a genotype including 52 SNPs; the best and most parsimonious model was achieved using a two-step analysis without regard to potential functional relevance: first SNPs were ranked by importance determined by random forests regression (RFR), followed by cross-validation in a second round of RFR modeling that included ever more SNPs in declining order of importance. With this approach six SNPs were found to minimize prediction error. The results should encourage research into utilization of multivariate analytical methods for epidemiological studies of the association of genetic variation in complex genotypes with risk of common diseases.  相似文献   

13.
SUMMARY: The interpretation of genome-wide association results is confounded by linkage disequilibrium between nearby alleles. We have developed a flexible bioinformatics query tool for single-nucleotide polymorphisms (SNPs) to identify and to annotate nearby SNPs in linkage disequilibrium (proxies) based on HapMap. By offering functionality to generate graphical plots for these data, the SNAP server will facilitate interpretation and comparison of genome-wide association study results, and the design of fine-mapping experiments (by delineating genomic regions harboring associated variants and their proxies). AVAILABILITY: SNAP server is available at http://www.broad.mit.edu/mpg/snap/.  相似文献   

14.
Prediction of the biological effect of missense substitutions has become important because they are often observed in known or candidate disease susceptibility genes. In this paper, we carried out a 3-step analysis of 1514 missense substitutions in the DNA-binding domain (DBD) of TP53, the most frequently mutated gene in human cancers. First, we calculated two types of conservation scores based on a TP53 multiple sequence alignment (MSA) for each substitution: (i) Grantham Variation (GV), which measures the degree of biochemical variation among amino acids found at a given position in the MSA; (ii) Grantham Deviation (GD), which reflects the 'biochemical distance' of the mutant amino acid from the observed amino acid at a particular position (given by GV). Second, we used a method that combines GV and GD scores, Align-GVGD, to predict the transactivation activity of each missense substitution. We compared our predictions against experimentally measured transactivation activity (yeast assays) to evaluate their accuracy. Finally, the prediction results were compared with those obtained by the program Sorting Intolerant from Tolerant (SIFT) and Dayhoff's classification. Our predictions yielded high prediction accuracy for mutants showing a loss of transactivation ( approximately 88% specificity) with lower prediction accuracy for mutants with transactivation similar to that of the wild-type (67.9 to 71.2% sensitivity). Align-GVGD results were comparable to SIFT (88.3 to 90.6% and 67.4 to 70.3% specificity and sensitivity, respectively) and outperformed Dayhoff's classification (80 and 40.9% specificity and sensitivity, respectively). These results further demonstrate the utility of the Align-GVGD method, which was previously applied to BRCA1. Align-GVGD is available online at http://agvgd.iarc.fr.  相似文献   

15.
16.

Background

Advances in genotyping technology, such as genotyping by sequencing (GBS), are making genomic prediction more attractive to reduce breeding cycle times and costs associated with phenotyping. Genomic prediction and selection has been studied in several crop species, but no reports exist in soybean. The objectives of this study were (i) evaluate prospects for genomic selection using GBS in a typical soybean breeding program and (ii) evaluate the effect of GBS marker selection and imputation on genomic prediction accuracy. To achieve these objectives, a set of soybean lines sampled from the University of Nebraska Soybean Breeding Program were genotyped using GBS and evaluated for yield and other agronomic traits at multiple Nebraska locations.

Results

Genotyping by sequencing scored 16,502 single nucleotide polymorphisms (SNPs) with minor-allele frequency (MAF) > 0.05 and percentage of missing values ≤ 5% on 301 elite soybean breeding lines. When SNPs with up to 80% missing values were included, 52,349 SNPs were scored. Prediction accuracy for grain yield, assessed using cross validation, was estimated to be 0.64, indicating good potential for using genomic selection for grain yield in soybean. Filtering SNPs based on missing data percentage had little to no effect on prediction accuracy, especially when random forest imputation was used to impute missing values. The highest accuracies were observed when random forest imputation was used on all SNPs, but differences were not significant. A standard additive G-BLUP model was robust; modeling additive-by-additive epistasis did not provide any improvement in prediction accuracy. The effect of training population size on accuracy began to plateau around 100, but accuracy steadily climbed until the largest possible size was used in this analysis. Including only SNPs with MAF > 0.30 provided higher accuracies when training populations were smaller.

Conclusions

Using GBS for genomic prediction in soybean holds good potential to expedite genetic gain. Our results suggest that standard additive G-BLUP models can be used on unfiltered, imputed GBS data without loss in accuracy.  相似文献   

17.
Many existing databases annotate experimentally characterized single nucleotide polymorphisms (SNPs). Each non-synonymous SNP (nsSNP) changes one amino acid in the gene product (single amino acid substitution;SAAS). This change can either affect protein function or be neutral in that respect. Most polymorphisms lack experimental annotation of their functional impact. Here, we introduce SNPdbe-SNP database of effects, with predictions of computationally annotated functional impacts of SNPs. Database entries represent nsSNPs in dbSNP and 1000 Genomes collection, as well as variants from UniProt and PMD. SAASs come from >2600 organisms; 'human' being the most prevalent. The impact of each SAAS on protein function is predicted using the SNAP and SIFT algorithms and augmented with experimentally derived function/structure information and disease associations from PMD, OMIM and UniProt. SNPdbe is consistently updated and easily augmented with new sources of information. The database is available as an MySQL dump and via a web front end that allows searches with any combination of organism names, sequences and mutation IDs. AVAILABILITY: http://www.rostlab.org/services/snpdbe.  相似文献   

18.
目的 男性型脱发(male pattern baldness,MPB),又称为雄激素性脱发(AGA),是一种常见的男性脱发类型,大约80%的表型差异可以用遗传因素解释。目前的MPB遗传推断研究主要基于欧洲人群,东亚人群相关研究较少。本研究在中国人群中对欧洲人群MPB关联位点进行验证分析,并建立遗传推断模型。方法 本研究调查了486个与欧洲人群MPB相关单核苷酸多态性(SNP)位点在312名中国汉族男性中的关联性,分别使用逐步回归和Lasso回归方法对关联出的位点进行筛选。使用逻辑回归算法构建预测模型,通过十折交叉验证的方法评估。之后进一步比较了逻辑回归、k近邻分类器、随机森林、支持向量机4种常用分类器模型对MPB的预测准确性。结果 有174个SNP位点与中国汉族男性的MPB显著相关(P<0.05)。通过不同的筛选方法,分别得到了22个SNP和25个SNP的位点集合。基于上述位点集合建立了22-SNP和 25-SNP两种逻辑回归预测模型。以AUC(ROC曲线下方的面积大小,area under curve)来衡量,两种模型对MPB预测的准确性分别为0.85和0.84;经十折交叉验证后预测准确性分别下降至0.81和0.77。当加入年龄作为预测因子后,两种模型的AUC均达到最大值0.89。从运行结果来看,逻辑回归预测模型较本研究中的其他分类器模型具有明显优势。结论 总体而言,虽然预测模型的准确性尚未达到临床期望水平,但SNP在MPB的遗传预测方面仍具备很大的潜力,可以为MPB的早期诊断、临床干预和法庭科学应用提供参考。  相似文献   

19.
MOTIVATION: The NCBI dbSNP database lists over 9 million single nucleotide polymorphisms (SNPs) in the human genome, but currently contains limited annotation information. SNPs that result in amino acid residue changes (nsSNPs) are of critical importance in variation between individuals, including disease and drug sensitivity. RESULTS: We have developed LS-SNP, a genomic scale software pipeline to annotate nsSNPs. LS-SNP comprehensively maps nsSNPs onto protein sequences, functional pathways and comparative protein structure models, and predicts positions where nsSNPs destabilize proteins, interfere with the formation of domain-domain interfaces, have an effect on protein-ligand binding or severely impact human health. It currently annotates 28,043 validated SNPs that produce amino acid residue substitutions in human proteins from the SwissProt/TrEMBL database. Annotations can be viewed via a web interface either in the context of a genomic region or by selecting sets of SNPs, genes, proteins or pathways. These results are useful for identifying candidate functional SNPs within a gene, haplotype or pathway and in probing molecular mechanisms responsible for functional impacts of nsSNPs. AVAILABILITY: http://www.salilab.org/LS-SNP CONTACT: rachelk@salilab.org SUPPLEMENTARY INFORMATION: http://salilab.org/LS-SNP/supp-info.pdf.  相似文献   

20.
Genetic variation analysis holds much promise as a basis for disease-gene association. However, due to the tremendous number of candidate single nucleotide polymorphisms (SNPs), there is a clear need to expedite genotyping by selecting and considering only a subset of all SNPs. This process is known as tagging SNP selection. Several methods for tagging SNP selection have been proposed, and have shown promising results. However, most of them rely on strong assumptions such as prior block-partitioning, bi-allelic SNPs, or a fixed number or location of tagging SNPs. We introduce BNTagger, a new method for tagging SNP selection, based on conditional independence among SNPs. Using the formalism of Bayesian networks (BNs), our system aims to select a subset of independent and highly predictive SNPs. Similar to previous prediction-based methods, we aim to maximize the prediction accuracy of tagging SNPs, but unlike them, we neither fix the number nor the location of predictive tagging SNPs, nor require SNPs to be bi-allelic. In addition, for newly-genotyped samples, BNTagger directly uses genotype data as input, while producing as output haplotype data of all SNPs. Using three public data sets, we compare the prediction performance of our method to that of three state-of-the-art tagging SNP selection methods. The results demonstrate that our method consistently improves upon previous methods in terms of prediction accuracy. Moreover, our method retains its good performance even when a very small number of tagging SNPs are used.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号