共查询到20条相似文献,搜索用时 15 毫秒
1.
Background
Predicting the functional impact of amino acid substitutions (AAS) caused by nonsynonymous single nucleotide polymorphisms (nsSNPs) is becoming increasingly important as more and more novel variants are being discovered. Bioinformatics analysis is essential to predict potentially causal or contributing AAS to human diseases for further analysis, as for each genome, thousands of rare or private AAS exist and only a very small number of which are related to an underlying disease. Existing algorithms in this field still have high false prediction rate and novel development is needed to take full advantage of vast amount of genomic data.Results
Here we report a novel algorithm that features two innovative changes: 1. making better use of sequence conservation information by grouping the homologous protein sequences into six blocks according to evolutionary distances to human and evaluating sequence conservation in each block independently, and 2. including as many such homologous sequences as possible in analyses. Random forests are used to evaluate sequence conservation in each block and to predict potential impact of an AAS on protein function. Testing of this algorithm on a comprehensive dataset showed significant improvement on prediction accuracy upon currently widely-used programs. The algorithm and a web-based application tool implementing it, EFIN (Evaluation of Functional Impact of Nonsynonymous SNPs) were made freely available (http://paed.hku.hk/efin/) to the public.Conclusions
Grouping homologous sequences into different blocks according to the evolutionary distance of the species to human and evaluating sequence conservation in each group independently significantly improved prediction accuracy. This approach may help us better understand the roles of genetic variants in human disease and health.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-455) contains supplementary material, which is available to authorized users. 相似文献2.
3.
人类基因组SNPs的研究现状及应用前景 总被引:2,自引:0,他引:2
基因组DNA是生物体各种生理、病理性状的物质基础,人类DNA序列变异约90%表现为单核苷酸多态性(singlenucleotidepolymorphisms,SNPs),这是一种常见的遗传变异类型,在人类基因组中广泛存在,被认为是人类疾病易感性和药物反应的决定性因素。本文主要介绍了SNPs的分类及特点、人类基因组SNPs的研究现状、SNPs在实践中的应用,以及SNPs在遗传作图、医药、遗传易感性、个体化医疗等方面的研究前景,并探讨了当前SNPs研究中存在的问题。 相似文献
4.
Functional differences between amino acids have long been of interest in understanding protein evolution. Several indices exist for comparing residues on the basis of their physicochemical properties and frequencies of occurrence in conserved protein alignments. Here we present a residue dissimilarity index based on coding single nucleotide polymorphisms (SNPs) in the human genome. The index represents an average, organism-wide set of differences between residues and provides important insight into evolutionary restraints on residue substitutions in the human genome. Unlike previous models, it is not restricted to highly conserved protein structures, nor confounded by evolutionary differences between species. Our results confirm earlier observations regarding residue mutabilities but also suggest that in addition to the established key properties, such as size and polarity, charge conservation may be an important and currently underestimated factor in protein evolution. We also estimate that less than 51% of amino acid substitutions occurring in the human genome are evolutionarily neutral. 相似文献
5.
群体凝血因子C同源物基因(Coagulation factor C homology,COCH)是人类发现的第一个伴前庭功能障碍的耳聋基因,位于人类染色体14q12-q13上。迄今,在COCH基因上发现16个位点突变导致常染色体显性遗传非综合征型耳聋DFNA9的发生,其中包括13个非同义单核苷酸多态性(Non-synonymous single nucleotide polymorphisms,nsSNPs)位点。由于该基因其他nsSNPs的基因型与表型关系尚不清楚,因此文章采用生物信息学方法,从COCH基因全部的SNPs中分级筛选,结合已知的致病nsSNPs信息及蛋白三维结构验证,首次预测出由COCH基因编码的cochlin蛋白的vWFA (Von Willebrand factor type A domain)区的8个高风险致病性nsSNPs(I176T、R180Q、G265E、V269L、I368N、I372T、R416C和Y424D)。同时,对位于LCCL (Limulus factor C, cochlin, and late gestation lung protein Lgl1)区域的6个已知致病突变的nsSNPs ( P51S、G87W、I109N、I109T、W117R和F121S)进行了三维结构模拟,发现突变体均发生了环状结构或链状结构的改变。本研究对COCH基因的基因型与表型的相关性研究为遗传性耳聋筛查提供了相应的理论依据,也对该基因所编码的cochlin蛋白的功能研究具有一定的指导意义。 相似文献
6.
So far, there is no genome-wide estimation of the mutational spectrum in humans. In this study, we systematically examined the directionality of the point mutations and maintenance of GC content in the human genome using approximately 1.8 million high-quality human single nucleotide polymorphisms and their ancestral sequences in chimpanzees. The frequency of C-->T (G-->A) changes was the highest among all mutation types and the frequency of each type of transition was approximately fourfold that of each type of transversion. In intergenic regions, when the GC content increased, the frequency of changes from G or C increased. In exons, the frequency of G:C-->A:T was the highest among the genomic categories and contributed mainly by the frequent mutations at the CpG sites. In contrast, mutations at the CpG sites, or CpG-->TpG/CpA mutations, occurred less frequently in the CpG islands relative to intergenic regions with similar GC content. Our results suggest that the GC content is overall not in equilibrium in the human genome, with a trend toward shifting the human genome to be AT rich and shifting the GC content of a region to approach the genome average. Our results, which differ from previous estimates based on limited loci or on the rodent lineage, provide the first representative and reliable mutational spectrum in the recent human genome and categorized genomic regions. 相似文献
7.
We analyzed n-mers (n=3-8) in the local environment of 8,249,446 human SNPs and compared their distribution with that in the genome reference sequences. The results revealed that the short sequences, which contained at least one CpG dinucleotide, occurred more frequently in the local SNP sequences than in the genome sequences. To exclude the hypermutability effect of the methylated CpG dinucleotides on the sequence context of SNPs, we examined the distribution patterns for each of the six categories of substitution. We observed the similar pattern (i.e., CpG-containing n-mers vs. non-CpG-containing n-mers) in SNP categories A/G, C/T and C/G but the opposite pattern in category A/T. We next identified 34,928 putative CpG islands in the human genome and located 133,591 SNPs within these islands. In the CpG islands, CpG SNPs were 3.92-fold less prevalent relative to the presence of CpG dinucleotides. Conversely, in the human genome, the frequency of CpG dinucleotides at the polymorphic sites was 6.09 times that in the genome reference sequences. These results support the previous views of mutational suppression at the CpG sites in the CpG islands and hypermutability of the methylated CpG dinucleotides that are prevalent in the non-CpG island sequences in the human genome. Our study represents a comprehensive investigation of the sequence context of SNPs in the human genome and in human CpG islands. 相似文献
8.
Nonsynonymous single nucleotide polymorphisms (nsSNPs) in coding regions can lead to amino acid changes that might alter the
protein’s function and account for susceptibility to disease and altered drug/xenobiotic response. Many nsSNPs have been found
in genes encoding human phase II metabolizing enzymes; however, there is little known about the relationship between the genotype
and phenotype of nsSNPs in these enzymes. We have identified 923 validated nsSNPs in 104 human phase II enzyme genes from
the Ensembl genome database and the NCBI SNP database. Using PolyPhen, Panther, and SNAP algorithms, 44%–59% of nsSNPs in
phase II enzyme genes were predicted to have functional impacts on protein function. Predictions largely agree with the available
experimental annotations. 68% of deleterious nsSNPs were correctly predicted as damaging. This study also identified many
amino acids that are likely to be functionally critical, but have not yet been studied experimentally. There was significant
concordance between the predicted results of Panther and PolyPhen, and between SNAP non-neutral predictions and PolyPhen scores.
Evolutionarily non-neutral (destabilizing) amino acid substitutions are thought to be the pathogenetic basis for the alteration
of phase II enzyme activity and to be associated with disease susceptibility and drug/xenobiotic toxicity. Furthermore, the
molecular evolutionary patterns of phase II enzymes were characterized with regards to the predicted deleterious nsSNPs. 相似文献
9.
10.
In response to a need for a general catalog of genome variation to address the large-scale sampling designs required by association studies, gene mapping and evolutionary biology, the National Cancer for Biotechnology Information (NCBI) has established the dbSNP database. Submissions to dbSNP will be integrated with other sources of information at NCBI such as GenBank, PubMed, LocusLink and the Human Genome Project data. The complete contents of dbSNP are available to the public at website: http://www.ncbi.nlm.nih.gov/SNP. Submitted SNPs can also be downloaded via anonymous FTP at ftp://ncbi.nlm.nih.gov/snp/ 相似文献
11.
A genome-wide view of sequence mutability in mice is still limited, although biologists usually assume the same scenario for mice as for humans. In this study, we examined the sequence context in the local environment of 482,528 mouse single nucleotide polymorphisms (SNPs). We found that CpG-containing short sequences, in general, had more representation in the local sequences of SNPs compared to the genome sequences. The extent of this overrepresentation was stronger in mice than in humans, which is inconsistent with previous observations of the weaker neighboring-nucleotide biases on mouse SNPs. To exclude the CpG effect, we compared the distribution patterns of short sequences among the six categories of SNPs. The results revealed an even stronger pattern in the CpG-containing group for C/G substitution compared to for A/G or C/T substitutions. We next performed the first genome-wide sequence context analysis of SNPs in the mouse CpG islands. SNPs occurring at CpG sites were 3.14-fold less prevalent than expected, suggesting the suppression of methylation-dependent deamination in the CpG islands. The extent of this suppression was less in mice than in humans. Finally, compared with humans, the observations of a greater deficit of CpG dinucleotides, a stronger overrepresentation of CpG-containing n-mers surrounding the polymorphic sites, and a higher SNP/genome ratio of CpG dinucleotides in the mouse genome support the "loss of CpG islands" model in the mouse lineage. 相似文献
12.
miRNA相关单核苷酸多态性(miRNA-related single nucleotide polymorphisms或mirSNP)是可以导致miRNA基因调控功能缺失或紊乱的一类功能型SNP的总称。不论是miRNA靶基因结合位点,还是miRNA基因或miRNA加工基因上的mirSNP,都有可能影响miRNA对靶基因的调控。miRNA基因及miRNA加工基因上的mirSNP主要通过阻碍miRNA的生物合成而发挥功能,而靶基因结合位点上的mirSNP主要通过导致自由能的改变或功能构象的消失,影响miRNA与靶序列结合而丧失其原有的调控功能。mirSNP大多位于人类基因组基因间区和内含子区,与包括肿瘤在内的众多复杂性疾病密切关联。mirSNP不论对于复杂性疾病发病机制研究还是诊疗预后分子标志的确定都具有极其重要的研究价值。 相似文献
13.
Single nucleotide polymorphism (SNP) detection technologies are used to scan for new polymorphisms and to determine the allele(s) of a known polymorphism in target sequences. SNP detection technologies have evolved from labor intensive, time consuming, and expensive processes to some of the most highly automated, efficient, and relatively inexpensive methods. Driven by the Human Genome Project, these technologies are now maturing and robust strategies are found in both SNP discovery and genotyping areas. The nearly completed human genome sequence provides the reference against which all other sequencing data can be compared. Global SNP discovery is therefore only limited by the amount of funding available for the activity. Local, target, SNP discovery relies mostly on direct DNA sequencing or on denaturing high performance liquid chromatography (dHPLC). The number of SNP genotyping methods has exploded in recent years and many robust methods are currently available. The demand for SNP genotyping is great, however, and no one method is able to meet the needs of all studies using SNPs. Despite the considerable gains over the last decade, new approaches must be developed to lower the cost and increase the speed of SNP detection. 相似文献
14.
Worth CL Bickerton GR Schreyer A Forman JR Cheng TM Lee S Gong S Burke DF Blundell TL 《Journal of bioinformatics and computational biology》2007,5(6):1297-1318
The prediction of the effects of nonsynonymous single nucleotide polymorphisms (nsSNPs) on function depends critically on exploiting all information available on the three-dimensional structures of proteins. We describe software and databases for the analysis of nsSNPs that allow a user to move from SNP to sequence to structure to function. In both structure prediction and the analysis of the effects of nsSNPs, we exploit information about protein evolution, in particular, that derived from investigations on the relation of sequence to structure gained from the study of amino acid substitutions in divergent evolution. The techniques developed in our laboratory have allowed fast and automated sequence-structure homology recognition to identify templates and to perform comparative modeling; as well as simple, robust, and generally applicable algorithms to assess the likely impact of amino acid substitutions on structure and interactions. We describe our strategy for approaching the relationship between SNPs and disease, and the results of benchmarking our approach -- human proteins of known structure and recognized mutation. 相似文献
15.
SNP2NMD: a database of human single nucleotide polymorphisms causing nonsense-mediated mRNA decay 总被引:1,自引:0,他引:1
Elucidating the effects of genetic polymorphisms on genes and gene networks is an important step in disease association studies. We developed the SNP2NMD database for human SNPs (single nucleotide polymorphisms) that result in PTCs (premature termination codons) and trigger nonsense-mediated mRNA decay (NMD). The SNP2NMD Web interfaces provide extensive genetic information on and graphical views of the queried SNP, gene, and disease terms. Availability: SNP2NMD is available from http://variome.net, or directly from http://bioportal.kobic.re.kr/SNP2NMD. Supplementary information: http://bioportal.kobic.re.kr/SNP2NMD/Wiki.jsp?page=Statistics. 相似文献
16.
Conifers are characterized by a large genome size and a rapid decay of linkage disequilibrium, most often within gene limits. Genome scans based on noncoding markers are less likely to detect molecular adaptation linked to genes in these species. In this study, we assessed the effectiveness of a genome-wide single nucleotide polymorphism (SNP) scan focused on expressed genes in detecting local adaptation in a conifer species. Samples were collected from six natural populations of white spruce ( Picea glauca ) moderately differentiated for several quantitative characters. A total of 534 SNPs representing 345 expressed genes were analysed. Genes potentially under natural selection were identified by estimating the differentiation in SNP frequencies among populations ( F ST ) and identifying outliers, and by estimating local differentiation using a Bayesian approach. Both average expected heterozygosity and population differentiation estimates ( H E = 0.270 and F ST = 0.006) were comparable to those obtained with other genetic markers. Of all genes, 5.5% were identified as outliers with F ST at the 95% confidence level, while 14% were identified as candidates for local adaptation with the Bayesian method. There was some overlap between the two gene sets. More than half of the candidate genes for local adaptation were specific to the warmest population, about 20% to the most arid population, and 15% to the coldest and most humid higher altitude population. These adaptive trends were consistent with the genes' putative functions and the divergence in quantitative traits noted among the populations. The results suggest that an approach separating the locus and population effects is useful to identify genes potentially under selection. These candidates are worth exploring in more details at the physiological and ecological levels. 相似文献
17.
In genetic association studies, such as genome-wide association studies (GWAS), the number of single nucleotide polymorphisms (SNPs) can be as large as hundreds of thousands. Due to linkage disequilibrium, many SNPs are highly correlated; assuming they are independent is not valid. The commonly used multiple comparison methods, such as Bonferroni correction, are not appropriate and are too conservative when applied to GWAS. To overcome these limitations, many approaches have been proposed to estimate the so-called effective number of independent tests to account for the correlations among SNPs. However, many current effective number estimation methods are based on eigenvalues of the correlation matrix. When the dimension of the matrix is large, the numeric results may be unreliable or even unobtainable. To circumvent this obstacle and provide better estimates, we propose a new effective number estimation approach which is not based on the eigenvalues. We compare the new method with others through simulated and real data. The comparison results show that the proposed method has very good performance. 相似文献
18.
The human genome is structured at multiple levels: it is organized into a series of replication time zones, and meanwhile it is composed of isochores. Accumulating evidence suggests a match between these two genome features. Based on newly developed software GC-Profile, we obtained a complete coverage of the human genome by 3198 isochores with boundaries at single nucleotide resolution. Interestingly, the experimentally confirmed replication timing sites in the regions of 1p36.1, 6p21.32, 17q11.2 and 22q12.1 nearly all coincide with the determined isochore boundaries. The precise boundaries of the 3198 isochores are available via the website: http://tubic.tju.edu.cn/isomap/. 相似文献
19.
Robert H. S. Kraus Pim van Hooft Hendrik‐Jan Megens Arseny Tsvey Sergei Y. Fokin Ronald C. Ydenberg Herbert H. T. Prins 《Molecular ecology》2013,22(1):41-55
Knowledge about population structure and connectivity of waterfowl species, especially mallards (Anas platyrhynchos), is a priority because of recent outbreaks of avian influenza. Ringing studies that trace large‐scale movement patterns have to date been unable to detect clearly delineated mallard populations. We employed 363 single nucleotide polymorphism markers in combination with population genetics and phylogeographical approaches to conduct a population genomic test of panmixia in 801 mallards from 45 locations worldwide. Basic population genetic and phylogenetic methods suggest no or very little population structure on continental scales. Nor could individual‐based structuring algorithms discern geographical structuring. Model‐based coalescent analyses for testing models of population structure pointed to strong genetic connectivity among the world's mallard population. These diverse approaches all support the conclusion that there is a lack of clear population structure, suggesting that the world's mallards, perhaps with minor exceptions, form a single large, mainly interbreeding population. 相似文献