首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 12 毫秒
1.
We studied several methods for selecting single-nucleotide polymorphisms (SNPs) in a disease association study. Two major categories for analytical strategy are the univariate and the set selection approaches. The univariate approach evaluates each SNP marker one at a time, while the set selection approach tests disease association of a set of SNP markers simultaneously. We examined various test statistics that can be utilized in testing disease association and also reviewed several multiple testing procedures that can properly control the family-wise error rates when the univariate approach is applied to multiple markers. The set association methods were then briefly reviewed. Finally, we applied these methods to the data from Collaborative Study on the Genetics of Alcoholism (COGA).  相似文献   

2.
The development of methods to assess the impact of amino acid mutations on human health has become an important goal in biomedical research, due to the growing number of nonsynonymous SNPs identified. Within this context, computational methods constitute a valuable tool, because they can easily process large amounts of mutations and give useful, almost cost-free, information on their pathological character. In this paper we present a computational approach to the prediction of disease-associated amino acid mutations, using only sequence-based information (amino acid properties, evolutionary information, secondary structure and accessibility predictions, and database annotations) and neural networks, as a model building tool. Mutations are predicted to be either pathological or neutral. Our results show that the method has a good overall success rate, 83%, that can reach 95% when trained for specific proteins. The methodology is fast and flexible enough to provide good estimates of the pathological character of large sets of nonsynonymous SNPs, but can also be easily adapted to give more precise predictions for proteins of special biomedical interest.  相似文献   

3.
See D  Kanazin V  Talbert H  Blake T 《BioTechniques》2000,28(4):710-4, 716
Single-nucleotide polymorphisms (SNPs) represent the most prevalent class of genetic markers available for linkage disequilibrium or cladistic analyses. PCR primers may be labeled with fluorescent dyes and used to rapidly and accurately differentiate among alleles that are defined by a single-nucleotide differences. Here, we describe the primer-mediated detection of SNPs based on primer mismatch during allele-specific amplification of preamplified target sequences. Primers are labeled with different fluors at their 5' nucleotides, with their 3' termini at the transition mutation that defines allelic variation at the target locus. Each primer perfectly matches one of the two available alleles for each locus. Electrophoretic detection permits characterization of the product both by size and fluor. This report demonstrates some of the capabilities of this assay, including heterozygote determination and multiplexed analysis.  相似文献   

4.
5.
We present the results of extensive simulations that emulate the development and distribution of linkage disequilibrium (LD) between single-nucleotide polymorphisms (SNPs) and a gene locus that is phenotypically stratified into two classes (disease phenotype and wild-type phenotype). Our approach, based on coalescence theory, allows an explicit modeling of the demographic history of the population without conditioning on the age of the mutation, and serves as an efficient tool to carry out simulations. More specifically, we compare the influence that a constant population size or an exponentially growing population has on the amount of LD. These results indicate that attempts to locate single disease genes are most likely successful in small and constant populations. On the other hand, if we consider an exponentially growing population that started to expand from an initially constant population of reasonable size, then our simulations indicate a lower success rate. The power to detect association is enhanced if haplotypes constructed from several SNPs are used as markers. The versatility of the coalescence approach also allows the analysis of other relevant factors that influence the chances that a disease gene will be located. We show that several alleles leading to the same disease have no substantial influence on the amount of LD, as long as the differences between the disease-causing alleles are confined to the same region of the gene locus and as long as each allele occurs in an appreciable frequency. Our simulations indicate that mapping of less-frequent diseases is more likely to be successful. Moreover, we show that successful attempts to map complex diseases depend crucially on the phenotype-genotype correlations of all alleles at the disease locus. An analysis of lipoprotein lipase data indicates that our simulations capture the major features of LD occurring in biological data.  相似文献   

6.
 Genetic linkage analysis in families with multiple cases of inflammatory bowel disease (IBD) has mapped a gene which confers susceptibility to IBD to the pericentromeric region of chromosome 16 (IBD1). The linked region includes the interleukin(IL)-4 receptor gene (IL4R). Since IL-4 regulation and expression are abnormal in IBD, the IL4R gene is thus both a positional and functional candidate for IBD1. We screened the gene for single-nucleotide polymorphisms (SNPs) by fluorescent chemical cleavage analysis, and tested a subset of known and novel SNPs for allelic association with IBD in 355 families, which included 435 cases of Crohn's disease and 329 cases of ulcerative colitis. No association was observed between a haplotype of four SNPs (val50ile, gln576arg, A3044G, G3289A) and either the Crohn's disease or ulcerative colitis phenotypes using the transmission disequilibrium test. There was also no evidence for association when the four markers were analyzed individually. The results indicate that these variants are not significant genetic determinants of IBD, and that the IL4R gene is unlikely to be IBD1. Linkage disequilibrium analyses showed that the val50ile and gln576arg variants are in complete equilibrium with each other, although they are separated by only about 21 kilobases of genomic DNA. This suggests that a very dense SNP map may be required to exclude or detect disease associations with some candidate genes. Received: 23 June 1999 / Revised: 18 August 1999  相似文献   

7.
8.
Structural location of disease-associated single-nucleotide polymorphisms   总被引:7,自引:0,他引:7  
Non-synonymous single-nucleotide polymorphism (nsSNP) of genes introduces amino acid changes to proteins, and plays an important role in providing genetic functional diversity. To understand the structural characteristics of disease-associated SNPs, we have mapped a set of nsSNPs derived from the online mendelian inheritance in man (OMIM) database to the structural surfaces of encoded proteins. These nsSNPs are disease-associated or have distinctive phenotypes. As a control dataset, we mapped a set of nsSNPs derived from SNP database dbSNP to the structural surfaces of those encoded proteins. Using the alpha shape method from computational geometry, we examine the geometric locations of the structural sites of these nsSNPs. We classify each nsSNP site into one of three categories of geometric locations: those in a pocket or a void (type P); those on a convex region or a shallow depressed region (type S); and those that are buried completely in the interior (type I). We find that the majority (88%) of disease-associated nsSNPs are located in voids or pockets, and they are infrequently observed in the interior of proteins (3.2% in the data set). We find that nsSNPs mapped from dbSNP are less likely to be located in pockets or voids (68%). We further introduce a novel application of hidden Markov models (HMM) for analyzing sequence homology of SNPs on various geometric sites. For SNPs on surface pocket or void, we find that there is no strong tendency for them to occur on conserved residues. For SNPs buried in the interior, we find that disease-associated mutations are more likely to be conserved. The approach of classifying nsSNPs with alpha shape and HMM developed in this study can be integrated with additional methods to improve the accuracy of predictions of whether a given nsSNP is likely to be disease-associated.  相似文献   

9.
PURPOSE OF REVIEW: The identification of regulatory polymorphisms has become a key problem in human genetics. In the past few years there has been a conceptual change in the way in which regulatory single-nucleotide polymorphisms are studied. We revise the new approaches and discuss how gene expression studies can contribute to a better knowledge of the genetics of common diseases. RECENT FINDINGS: New techniques for the association of single-nucleotide polymorphisms with changes in gene expression have been recently developed. This, together with a more comprehensive use of the old in-vitro methods, has produced a great amount of genetic information. When added to current databases, it will help to design better tools for the detection of regulatory single-nucleotide polymorphisms. SUMMARY: The identification of functional regulatory single-nucleotide polymorphisms cannot be done by the simple inspection of DNA sequence. In-vivo techniques, based on primer-extension, and the more recently developed 'haploChIP' allow the association of gene variants to changes in gene expression. Gene expression analysis by conventional in-vitro techniques is the only way to identify the functional consequences of regulatory single-nucleotide polymorphisms. The amount of information produced in the last few years will help to refine the tools for the future analysis of regulatory gene variants.  相似文献   

10.
Anderson EC  Garza JC 《Genetics》2006,172(4):2567-2582
Likelihood-based parentage inference depends on the distribution of a likelihood-ratio statistic, which, in most cases of interest, cannot be exactly determined, but only approximated by Monte Carlo simulation. We provide importance-sampling algorithms for efficiently approximating very small tail probabilities in the distribution of the likelihood-ratio statistic. These importance-sampling methods allow the estimation of small false-positive rates and hence permit likelihood-based inference of parentage in large studies involving a great number of potential parents and many potential offspring. We investigate the performance of these importance-sampling algorithms in the context of parentage inference using single-nucleotide polymorphism (SNP) data and find that they may accelerate the computation of tail probabilities >1 millionfold. We subsequently use the importance-sampling algorithms to calculate the power available with SNPs for large-scale parentage studies, paying particular attention to the effect of genotyping errors and the occurrence of related individuals among the members of the putative mother-father-offspring trios. These simulations show that 60-100 SNPs may allow accurate pedigree reconstruction, even in situations involving thousands of potential mothers, fathers, and offspring. In addition, we compare the power of exclusion-based parentage inference to that of the likelihood-based method. Likelihood-based inference is much more powerful under many conditions; exclusion-based inference would require 40% more SNP loci to achieve the same accuracy as the likelihood-based approach in one common scenario. Our results demonstrate that SNPs are a powerful tool for parentage inference in large managed and/or natural populations.  相似文献   

11.
Torkamani A  Schork NJ 《Genomics》2007,90(1):49-58
The human kinase gene family is composed of 518 genes that are involved in a diverse spectrum of physiological functions. They are also implicated in a number of diseases and encompass 10% of current drug targets. Contemporary, high-throughput sequencing efforts have identified a rich source of naturally occurring single nucleotide polymorphisms (SNPs) in kinases, a subset of which occur in the coding region of genes (cSNPs) and result in a change in the encoded amino acid sequence (nonsynonymous coding SNP; nscSNPs). What fraction of this naturally occurring variation underlies human disease is largely unknown (uDC), and much of it is assumed not to be disease causing (DC). We pursued a comprehensive computational analysis of the distribution of 1463 nscSNPs and 999 DC nscSNPs within the kinase gene family and have found that DCs are overrepresentated in the kinase catalytic domain and in receptor structures. In addition, the frequencies with which specific amino acid changes occur differ between the DCs and the uDCs, implying different biological characteristics for the two sets of human polymorphisms. Our results provide insights into the sequence and structural phenomena associated with naturally occurring kinase nscSNPs that contribute to human diseases.  相似文献   

12.
The G-protein-coupled receptor (GPCR) superfamily is one of the largest classes of proteins in mammalian genomes. GPCRs mediate diverse physiological functions and are the targets of >50% of all clinical drugs. The sequencing of the human genome and large-scale polymorphism discovery efforts have established an abundant source of single nucleotide polymorphisms (SNPs), particularly those that result in a change in the encoded amino acids (cSNPs), many are of which in GPCRs. Although the majority of these cSNPs are assumed not to be disease-causing (nDCs), experimental data on their functional impact are lacking. Here, we have computationally analyzed the distribution of 454 cSNPs within the GPCR gene family and have found that disease-causing cSNPs (DCs) are overrepresented, whereas nDCs are underrepresented or neutral in transmembrane and extracellular loop domains, respectively. This finding reflects the relative importance of these domains to GPCR function and implies different biological characteristics for the two sets of human polymorphisms.  相似文献   

13.
14.
The accuracy of the vast amount of genotypic information generated by high-throughput genotyping technologies is crucial in haplotype analyses and linkage-disequilibrium mapping for complex diseases. To date, most automated programs lack quality measures for the allele calls; therefore, human interventions, which are both labor intensive and error prone, have to be performed. Here, we propose a novel genotype clustering algorithm, GeneScore, based on a bivariate t-mixture model, which assigns a set of probabilities for each data point belonging to the candidate genotype clusters. Furthermore, we describe an expectation-maximization (EM) algorithm for haplotype phasing, GenoSpectrum (GS)-EM, which can use probabilistic multilocus genotype matrices (called "GenoSpectrum") as inputs. Combining these two model-based algorithms, we can perform haplotype inference directly on raw readouts from a genotyping machine, such as the TaqMan assay. By using both simulated and real data sets, we demonstrate the advantages of our probabilistic approach over the current genotype scoring methods, in terms of both the accuracy of haplotype inference and the statistical power of haplotype-based association analyses.  相似文献   

15.

Background

The mitochondrial (mt) displacement loop (D-loop) is known to accumulate structural alterations and mutations. The aim of this study was to investigate the prevalence of single nucleotide polymorphisms (SNPs) within the D-loop among chronic dialysis patients and healthy controls.

Methodology and Principal Findings

We enrolled 193 chronic dialysis patients and 704 healthy controls. SNPs were identified by large scale D-loop sequencing and bioinformatic analysis. Chronic dialysis patients had lower body mass index, blood thiols, and cholesterol levels than controls. A total of 77 SNPs matched with the positions in reference of the Revised Cambridge Reference Sequence (CRS) were found in the study population. Chronic dialysis patients had a significantly higher incidence of 9 SNPs compared to controls. These include SNP5 (16108Y), SNP17 (16172Y), SNP21 (16223Y), SNP34 (16274R), SNP35 (16278Y), SNP55 (16463R), SNP56 (16519Y), SNP64 (185R), and SNP65 (189R) in D-loop of CRS. Among these SNPs with genotypes, SNP55-G, SNP56-C, and SNP64-A were 4.78, 1.47, and 5.15 times more frequent in dialysis patients compared to controls (P<0.05), respectively. When adjusting the covariates of demographics and comorbidities, SNP64-A was 5.13 times more frequent in dialysis patients compared to controls (P<0.01). Furthermore, SNP64-A was found to be 35.80, 3.48, 4.69, 5,55, and 4.67 times higher in female patients and in patients without diabetes, coronary artery disease, smoking, and hypertension in an independent significance manner (P<0.05), respectively. In patients older than 50 years or with hypertension, SNP34-A and SNP17-C were found to be 7.97 and 3.71 times more frequent (P<0.05) compared to patients younger than 50 years or those without hypertension, respectively.

Conclusions and Significance

The results of large-scale sequencing suggest that specific SNPs in the mtDNA D-loop are significantly associated with chronic dialysis. These SNPs can be considered as potential predictors for chronic dialysis.  相似文献   

16.
Single-nucleotide polymorphisms (SNPs), believed to determine human differences, are widely used to predict risk of diseases. Typically, clinical samples are limited and/or the sampling cost is high. Thus, it is essential to determine an adequate sample size needed to build a classifier based on SNPs. Such a classifier would facilitate correct classifications, while keeping the sample size to a minimum, thereby making the studies cost-effective. For coded SNP data from 2 classes, an optimal classifier and an approximation to its probability of correct classification (PCC) are derived. A linear classifier is constructed and an approximation to its PCC is also derived. These approximations are validated through a variety of Monte Carlo simulations. A sample size determination algorithm based on the criterion, which ensures that the difference between the 2 approximate PCCs is below a threshold, is given and its effectiveness is illustrated via simulations. For the HapMap data on Chinese and Japanese populations, a linear classifier is built using 51 independent SNPs, and the required total sample sizes are determined using our algorithm, as the threshold varies. For example, when the threshold value is 0.05, our algorithm determines a total sample size of 166 (83 for Chinese and 83 for Japanese) that satisfies the criterion.  相似文献   

17.
Accurately resolving population structure in a sample is important for both linkage and association studies. In this study we investigated the power of single-nucleotide polymorphisms (SNPs) in detecting population structure in a sample of 286 unrelated individuals. We varied the number of SNPs to determine how many are required to approach the degree of resolution obtained with the Collaborative Study on the Genetics of Alcoholism (COGA) short tandem repeat polymorphisms (STRPs). In addition, we selected SNPs with varying minor allele frequencies (MAFs) to determine whether low or high frequency SNPs are more efficient in resolving population structure. We conclude that a set of at least 100 evenly spaced SNPs with MAFs of 40-50% is required to resolve population structure in this dataset. If SNPs with lower MAFs are used, then more than 250 SNPs may be required to obtain reliable results.  相似文献   

18.
Our goal was to compare methods for tagging single-nucleotide polymorphisms (tagSNPs) with respect to the power to detect disease association under differing haplotype-disease association models. We were also interested in the effect that SNP selection samples, consisting of either cases, controls, or a mixture, would have on power. We investigated five previously described algorithms for choosing tagSNPS: two that picked SNPs based on haplotype structure (Chapman-haplotypic and Stram), two that picked SNPs based on pair-wise allelic association (Chapman-allelic and Cousin), and one control method that chose equally spaced SNPs (Zhai). In two disease-associated regions from the Genetic Analysis Workshop 14 simulated data, we tested the association between tagSNP genotype and disease over the tagSNP sets chosen by each method for each sampling scheme. This was repeated for 100 replicates to estimate power. The two allelic methods chose essentially all SNPs in the region and had nearly optimal power. The two haplotypic methods chose about half as many SNPs. The haplotypic methods had poor performance compared to the allelic methods in both regions. We expected an improvement in power when the selection sample contained cases; however, there was only moderate variation in power between the sampling approaches for each method. Finally, when compared to the haplotypic methods, the reference method performed as well or worse in the region with ancestral disease haplotype structure.  相似文献   

19.
20.
We identified 37 single-nucleotide polymorphisms (SNPs) in sheep and screened 16 individuals from 8 different sheep breeds selected throughout Europe. Population genetic measures based on the genotyping of about 30 sheep from the same 8 breeds are reported. To date, there are no sheep SNPs documented in the National Center for Biotechnology Information dbSNP database. Therefore, the markers presented here contribute significantly to those currently available.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号