首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: There has been great expectation that the knowledge of an individual's genotype will provide a basis for assessing susceptibility to diseases and designing individualized therapy. Non-synonymous single nucleotide polymorphisms (nsSNPs) that lead to an amino acid change in the protein product are of particular interest because they account for nearly half of the known genetic variations related to human inherited diseases. To facilitate the identification of disease-associated nsSNPs from a large number of neutral nsSNPs, it is important to develop computational tools to predict the phenotypic effects of nsSNPs. RESULTS: We prepared a training set based on the variant phenotypic annotation of the Swiss-Prot database and focused our analysis on nsSNPs having homologous 3D structures. Structural environment parameters derived from the 3D homologous structure as well as evolutionary information derived from the multiple sequence alignment were used as predictors. Two machine learning methods, support vector machine and random forest, were trained and evaluated. We compared the performance of our method with that of the SIFT algorithm, which is one of the best predictive methods to date. An unbiased evaluation study shows that for nsSNPs with sufficient evolutionary information (with not <10 homologous sequences), the performance of our method is comparable with the SIFT algorithm, while for nsSNPs with insufficient evolutionary information (<10 homologous sequences), our method outperforms the SIFT algorithm significantly. These findings indicate that incorporating structural information is critical to achieving good prediction accuracy when sufficient evolutionary information is not available. AVAILABILITY: The codes and curated dataset are available at http://compbio.utmem.edu/snp/dataset/  相似文献   

2.
3.
Bao L  Cui Y 《FEBS letters》2006,580(5):1231-1234
In this work, we studied the correlations between selective constraint, structural environments and functional impacts of non-synonymous single nucleotide polymorphisms (nsSNPs). We found that the relation between solvent accessibility and functional impacts of nsSNPs is not as simple as generally thought. Finer structural classifications need to be taken into account to reveal the complex relations between the characteristics of a structure environment and its influence on the functional impacts of nsSNPs. We introduced two parameters for each structural environment, consensus residue percentage and residue distribution distance, to characterize the selective constraint imposed by the environment. Both parameters significantly correlate with the functional bias of nsSNPs across the structural environments. This result shows that selective constraint underlies the bias of a structural environment towards a certain type of nsSNPs (disease-associated or benign).  相似文献   

4.
MOTIVATION: Single nucleotide polymorphisms (SNPs) are the most common form of genetic variant in humans. SNPs causing amino acid substitutions are of particular interest as candidates for loci affecting susceptibility to complex diseases, such as diabetes and hypertension. To efficiently screen SNPs for disease association, it is important to distinguish neutral variants from deleterious ones. RESULTS: We describe the use of Pfam protein motif models and the HMMER program to predict whether amino acid changes in conserved domains are likely to affect protein function. We find that the magnitude of the change in the HMMER E-value caused by an amino acid substitution is a good predictor of whether it is deleterious. We provide internet-accessible display tools for a genomewide collection of SNPs, including 7391 distinct non-synonymous coding region SNPs in 2683 genes. AVAILABILITY: http://lpgws.nci.nih.gov/cgi-bin/GeneViewer.cgi  相似文献   

5.
As the number of non-synonymous single nucleotide polymorphisms (nsSNPs) identified through whole-exome/whole-genome sequencing programs increases, researchers and clinicians are becoming increasingly reliant upon computational prediction algorithms designed to prioritize potential functional variants for further study. A large proportion of existing prediction algorithms are ‘disease agnostic’ but are nevertheless quite capable of predicting when a mutation is likely to be deleterious. However, most clinical and research applications of these algorithms relate to specific diseases and would therefore benefit from an approach that discriminates between functional variants specifically related to that disease from those which are not. In a whole-exome/whole-genome sequencing context, such an approach could substantially reduce the number of false positive candidate mutations. Here, we test this postulate by incorporating a disease-specific weighting scheme into the Functional Analysis through Hidden Markov Models (FATHMM) algorithm. When compared to traditional prediction algorithms, we observed an overall reduction in the number of false positives identified using a disease-specific approach to functional prediction across 17 distinct disease concepts/categories. Our results illustrate the potential benefits of making disease-specific predictions when prioritizing candidate variants in relation to specific diseases. A web-based implementation of our algorithm is available at http://fathmm.biocompute.org.uk.  相似文献   

6.

Background  

As the number of non-synonymous single nucleotide polymorphisms (nsSNPs), also known as single amino acid polymorphisms (SAPs), increases rapidly, computational methods that can distinguish disease-causing SAPs from neutral SAPs are needed. Many methods have been developed to distinguish disease-causing SAPs based on both structural and sequence features of the mutation point. One limitation of these methods is that they are not applicable to the cases where protein structures are not available. In this study, we explore the feasibility of classifying SAPs into disease-causing and neutral mutations using only information derived from protein sequence.  相似文献   

7.

Background  

Human genetic variations primarily result from single nucleotide polymorphisms (SNPs) that occur approximately every 1000 bases in the overall human population. The non-synonymous SNPs (nsSNPs) that lead to amino acid changes in the protein product may account for nearly half of the known genetic variations linked to inherited human diseases. One of the key problems of medical genetics today is to identify nsSNPs that underlie disease-related phenotypes in humans. As such, the development of computational tools that can identify such nsSNPs would enhance our understanding of genetic diseases and help predict the disease.  相似文献   

8.
9.
10.
Certain genetic variations in the human population are associated with heritable diseases, and single nucleotide polymorphisms (SNPs) represent the most common form of such differences in DNA sequence. In particular, substantial interest exists in determining whether a non-synonymous SNP (nsSNP), leading to a single residue replacement in the translated protein product, is neutral or disease-related. The nature of protein structure-function relationships suggests that nsSNP effects, either benign or leading to aberrant protein function possibly associated with disease, are dependent on relative structural changes introduced upon mutation. In this study, we characterize a representative sampling of 1790 documented neutral and disease-related human nsSNPs mapped to 243 diverse human protein structures, by quantifying environmental perturbations in the associated proteins with the use of a computational mutagenesis methodology that relies on a four-body, knowledge-based, statistical contact potential. These structural change data are used as attributes to generate a vector representation for each nsSNP, in combination with additional features reflecting sequence and structure of the corresponding protein. A trained model based on the random forest supervised classification algorithm achieves 76% cross-validation accuracy. Our classifier performs at least as well as other methods that use significantly larger datasets of nsSNPs for model training, and the novelty of our attributes differentiates the model as an orthogonal approach that can be utilized in conjunction with other techniques. A dedicated server for obtaining predictions, as well as supporting datasets and documentation, is available at http://proteins.gmu.edu/automute.  相似文献   

11.
We have developed a formalism and a computational method for analyzing the potential functional consequences of non-synonymous single nucleotide polymorphisms. Our approach uses a structural model and phylogenetic information to derive a selection of structure and sequence-based features serving as indicators of an amino acid polymorphim's effect on function. The feature values can be integrated into a probabilistic assessment of whether an amino acid polymorphism will affect the function or stability of a target protein. The method has been validated with data sets of unbiased mutations in the lac repressor and lysoyzyme. Applying our methodology to recent surveys of genetic variation in the coding regions of clinically important genes, we estimate that approximately 26-32 % of the natural non-synonymous single nucleotide polymorphisms have effects on function. This estimate suggests that a typical person will have about 6240-12,800 heterozygous loci that encode proteins with functional variation due to natural amino acid polymorphism.  相似文献   

12.
Single nucleotide polymorphism (SNP) detection technologies are used to scan for new polymorphisms and to determine the allele(s) of a known polymorphism in target sequences. SNP detection technologies have evolved from labor intensive, time consuming, and expensive processes to some of the most highly automated, efficient, and relatively inexpensive methods. Driven by the Human Genome Project, these technologies are now maturing and robust strategies are found in both SNP discovery and genotyping areas. The nearly completed human genome sequence provides the reference against which all other sequencing data can be compared. Global SNP discovery is therefore only limited by the amount of funding available for the activity. Local, target, SNP discovery relies mostly on direct DNA sequencing or on denaturing high performance liquid chromatography (dHPLC). The number of SNP genotyping methods has exploded in recent years and many robust methods are currently available. The demand for SNP genotyping is great, however, and no one method is able to meet the needs of all studies using SNPs. Despite the considerable gains over the last decade, new approaches must be developed to lower the cost and increase the speed of SNP detection.  相似文献   

13.
MOTIVATION: B cells responding to antigenic stimulation can fine-tune their binding properties through a process of affinity maturation composed of somatic hypermutation, affinity-selection and clonal expansion. The mutation rate of the B cell receptor DNA sequence, and the effect of these mutations on affinity and specificity, are of critical importance for understanding immune and autoimmune processes. Unbiased estimates of these properties are currently lacking due to the short time-scales involved and the small numbers of sequences available. RESULTS: We have developed a bioinformatic method based on a maximum likelihood analysis of phylogenetic lineage trees to estimate the parameters of a B cell clonal expansion model, which includes somatic hypermutation with the possibility of lethal mutations. Lineage trees are created from clonally related B cell receptor DNA sequences. Important links between tree shapes and underlying model parameters are identified using mutual information. Parameters are estimated using a likelihood function based on the joint distribution of several tree shapes, without requiring a priori knowledge of the number of generations in the clone (which is not available for rapidly dividing populations in vivo). A systematic validation on synthetic trees produced by a mutating birth-death process simulation shows that our estimates are precise and robust to several underlying assumptions. These methods are applied to experimental data from autoimmune mice to demonstrate the existence of hypermutating B cells in an unexpected location in the spleen.  相似文献   

14.
The human genetic diseases associated with many factors, one of these factors is the non-synonymous Single Nucleotide Variants (nsSNVs) cause single amino acid change with another resulting in protein function change leading to disease. Many computational techniques have been released to expect the impacts of amino acid alteration on protein function and classify mutations as pathogenic or neutral. Here in this article, we assessed the performance of eight techniques; FATHMM, SIFT, Provean, iFish, Mutation Assessor, PANTHER, SNAP2, and PON- P2 using a VaribenchSelectedPure dataset of 2144 pathogenic variants and 3777 neutral variants extracted from the free standard database “Varibench.” The first five techniques achieve (45.60–83.75) % specificity, (52.64–94.13) % sensitivity, (51.00–88.90) % AUC, and (49.76–88.24) % ACC on whole dataset, while all eight techniques achieve (36.54–77.88) % specificity, (50.00–75.00) % sensitivity, (51.00–76.40) % AUC, and (25.00–77.78) % ACC on random sample dataset. We also created a Meta classifier (CSTJ48) that combines FATHMM, iFish, and Mutation Assessor. It registers 96.33% specificity, 86.07% sensitivity, 91.20% AUC, and 91.89 ACC. By comparing the results, it's clear that FATHMM gives the highest performance over the seven individual techniques, where it achieves 83.75% and 77.88% specificity, 94.13%, and 75.00% sensitivity, 88.90% and 76.40% AUC, and 88.24% and 77.78% ACC on whole and random sample dataset, respectively. Also, the launched Meta classifier (CSTJ48) is outperforming over all the eight individual tools that compared here.  相似文献   

15.
The single nucleotide polymorphism (SNP) is the difference of the DNA sequence between individuals and provides abundant information about genetic variation. Large scale discovery of high frequency SNPs is being undertaken using various methods. However, the publicly available SNP data sometimes need to be verified. If only a particular gene locus is concerned, locus-specific polymerase chain reaction amplification may be useful. Problem of this method is that the secondary peak has to be measured. We have analyzed trace data from conventional sequencing equipment and found an applicable rule to discern SNPs from noise. The rule is applied to multiply aligned sequences with a trace and the peak height of the traces are compared between samples. We have developed software that integrates this function to automatically identify SNPs. The software works accurately for high quality sequences and also can detect SNPs in low quality sequences. Further, it can determine allele frequency, display this information as a bar graph and assign corresponding nucleotide combinations. It is also designed for a person to verify and edit sequences easily on the screen. It is very useful for identifying de novo SNPs in a DNA fragment of interest.  相似文献   

16.
17.
Comparative genomics, analyzing variation among individual genomes, is an area of intense investigation. DNA sequencing is usually employed to look for polymorphisms and mutations. Pyrosequencing, a real-time DNA sequencing method, is emerging as a popular platform for comparative genomics. Here we review the use of this technology for mutation scanning, polymorphism discovery and chemical haplotyping. We describe the methodology and accuracy of this technique and discuss how to reduce the cost for large-scale analysis.  相似文献   

18.
19.
单核苷酸多态概述   总被引:4,自引:0,他引:4  
刘木根  赵寿元 《生命科学》2000,12(6):277-281
单核苷酸多态SNP是遍布于基因组中的一种DNA序列变化类型,人类基因组中平均约每一千碱基中有一个。单核苷酸多态是一种双等位型多态,群体中出现的频率大于1%或2%者视为多态,低于1%或2%的则视为突变。由于其具有高信息量、高密度又便于自动化操作的特点,单核苷酸多态在遗传性疾病基因的克隆和药物的设计与开发方面具有广阔的应用前景。本文对单核苷酸的概念、特点、应用前景,及其研究应用的一些问题作一综述。  相似文献   

20.
E Chiapparino  D Lee  P Donini 《Génome》2004,47(2):414-420
Single nucleotide polymorphisms (SNPs) are the most abundant form of DNA polymorphism. These polymorphisms can be used in plants as simple genetic markers for many breeding applications, for population studies, and for germplasm fingerprinting. The great increase in the available DNA sequences in the databases has made it possible to identify SNPs by "database mining", and the single most important factor preventing their widespread use appears to be the genotyping cost. Many genotyping platforms rely on the use of sophisticated, automated equipment coupled to costly chemistry and detection systems. A simple and economical method involving a single PCR is reported here for barley SNP genotyping. Using the tetra-primer ARMS-PCR procedure, we have been able to assay unambiguously five SNPs in a set of 132 varieties of cultivated barley. The results show the reliability of this technique and its potential for use in low- to moderate-throughput situations; the association of agronomically important traits is discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号