首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
GWAS have emerged as popular tools for identifying genetic variants that are associated with disease risk. Standard analysis of a case-control GWAS involves assessing the association between each individual genotyped SNP and disease risk. However, this approach suffers from limited reproducibility and difficulties in detecting multi-SNP and epistatic effects. As an alternative analytical strategy, we propose grouping SNPs together into SNP sets on the basis of proximity to genomic features such as genes or haplotype blocks, then testing the joint effect of each SNP set. Testing of each SNP set proceeds via the logistic kernel-machine-based test, which is based on a statistical framework that allows for flexible modeling of epistatic and nonlinear SNP effects. This flexibility and the ability to naturally adjust for covariate effects are important features of our test that make it appealing in comparison to individual SNP tests and existing multimarker tests. Using simulated data based on the International HapMap Project, we show that SNP-set testing can have improved power over standard individual-SNP analysis under a wide range of settings. In particular, we find that our approach has higher power than individual-SNP analysis when the median correlation between the disease-susceptibility variant and the genotyped SNPs is moderate to high. When the correlation is low, both individual-SNP analysis and the SNP-set analysis tend to have low power. We apply SNP-set analysis to analyze the Cancer Genetic Markers of Susceptibility (CGEMS) breast cancer GWAS discovery-phase data.  相似文献   

2.
The relationship between inflammation and cancer is well established in several tumor types, including bladder cancer. We performed an association study between 886 inflammatory-gene variants and bladder cancer risk in 1,047 cases and 988 controls from the Spanish Bladder Cancer (SBC)/EPICURO Study. A preliminary exploration with the widely used univariate logistic regression approach did not identify any significant SNP after correcting for multiple testing. We further applied two more comprehensive methods to capture the complexity of bladder cancer genetic susceptibility: Bayesian Threshold LASSO (BTL), a regularized regression method, and AUC-Random Forest, a machine-learning algorithm. Both approaches explore the joint effect of markers. BTL analysis identified a signature of 37 SNPs in 34 genes showing an association with bladder cancer. AUC-RF detected an optimal predictive subset of 56 SNPs. 13 SNPs were identified by both methods in the total population. Using resources from the Texas Bladder Cancer study we were able to replicate 30% of the SNPs assessed. The associations between inflammatory SNPs and bladder cancer were reexamined among non-smokers to eliminate the effect of tobacco, one of the strongest and most prevalent environmental risk factor for this tumor. A 9 SNP-signature was detected by BTL. Here we report, for the first time, a set of SNP in inflammatory genes jointly associated with bladder cancer risk. These results highlight the importance of the complex structure of genetic susceptibility associated with cancer risk.  相似文献   

3.
GWAS has facilitated greatly the discovery of risk SNPs associated with complex diseases. Traditional methods analyze SNP individually and are limited by low power and reproducibility since correction for multiple comparisons is necessary. Several methods have been proposed based on grouping SNPs into SNP sets using biological knowledge and/or genomic features. In this article, we compare the linear kernel machine based test (LKM) and principal components analysis based approach (PCA) using simulated datasets under the scenarios of 0 to 3 causal SNPs, as well as simple and complex linkage disequilibrium (LD) structures of the simulated regions. Our simulation study demonstrates that both LKM and PCA can control the type I error at the significance level of 0.05. If the causal SNP is in strong LD with the genotyped SNPs, both the PCA with a small number of principal components (PCs) and the LKM with kernel of linear or identical-by-state function are valid tests. However, if the LD structure is complex, such as several LD blocks in the SNP set, or when the causal SNP is not in the LD block in which most of the genotyped SNPs reside, more PCs should be included to capture the information of the causal SNP. Simulation studies also demonstrate the ability of LKM and PCA to combine information from multiple causal SNPs and to provide increased power over individual SNP analysis. We also apply LKM and PCA to analyze two SNP sets extracted from an actual GWAS dataset on non-small cell lung cancer.  相似文献   

4.
Genetic mutations may interact to increase the risk of human complex diseases. Mapping of multiple interacting disease loci in the human genome has recently shown promise in detecting genes with little main effects. The power of interaction association mapping, however, can be greatly influenced by the set of single nucleotide polymorphism (SNP) genotyped in a case-control study. Previous imputation methods only focus on imputation of individual SNPs without considering their joint distribution of possible interactions. We present a new method that simultaneously detects multilocus interaction associations and imputes missing SNPs from a full Bayesian model. Our method treats both the case-control sample and the reference data as random observations. The output of our method is the posterior probabilities of SNPs for their marginal and interacting associations with the disease. Using simulations, we show that the method produces accurate and robust imputation with little overfitting problems. We further show that, with the type I error rate maintained at a common level, SNP imputation can consistently and sometimes substantially improve the power of detecting disease interaction associations. We use a data set of inflammatory bowel disease to demonstrate the application of our method.  相似文献   

5.
6.
Statistical methods to test for effects of single nucleotide polymorphisms (SNPs) on exon inclusion exist but often rely on testing of associations between multiple exon–SNP pairs, with sometimes subsequent summarization of results at the gene level. Such approaches require heavy multiple testing corrections and detect mostly events with large effect sizes. We propose here a test to find spliceQTL (splicing quantitative trait loci) effects that takes all exons and all SNPs into account simultaneously. For any chosen gene, this score-based test looks for an association between the set of exon expressions and the set of SNPs, via a random-effects model framework. It is efficient to compute and can be used if the number of SNPs is larger than the number of samples. In addition, the test is powerful in detecting effects that are relatively small for individual exon–SNP pairs but are observed for many pairs. Furthermore, test results are more often replicated across datasets than pairwise testing results. This makes our test more robust to exon–SNP pair-specific effects, which do not extend to multiple pairs within the same gene. We conclude that the test we propose here offers more power and better replicability in the search for spliceQTL effects.  相似文献   

7.

Background

Genome-wide association studies (GWAS) have become a common approach to identifying single nucleotide polymorphisms (SNPs) associated with complex diseases. As complex diseases are caused by the joint effects of multiple genes, while the effect of individual gene or SNP is modest, a method considering the joint effects of multiple SNPs can be more powerful than testing individual SNPs. The multi-SNP analysis aims to test association based on a SNP set, usually defined based on biological knowledge such as gene or pathway, which may contain only a portion of SNPs with effects on the disease. Therefore, a challenge for the multi-SNP analysis is how to effectively select a subset of SNPs with promising association signals from the SNP set.

Results

We developed the Optimal P-value Threshold Pedigree Disequilibrium Test (OPTPDT). The OPTPDT uses general nuclear families. A variable p-value threshold algorithm is used to determine an optimal p-value threshold for selecting a subset of SNPs. A permutation procedure is used to assess the significance of the test. We used simulations to verify that the OPTPDT has correct type I error rates. Our power studies showed that the OPTPDT can be more powerful than the set-based test in PLINK, the multi-SNP FBAT test, and the p-value based test GATES. We applied the OPTPDT to a family-based autism GWAS dataset for gene-based association analysis and identified MACROD2-AS1 with genome-wide significance (p-value= 2.5 × 10− 6).

Conclusions

Our simulation results suggested that the OPTPDT is a valid and powerful test. The OPTPDT will be helpful for gene-based or pathway association analysis. The method is ideal for the secondary analysis of existing GWAS datasets, which may identify a set of SNPs with joint effects on the disease.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1620-3) contains supplementary material, which is available to authorized users.  相似文献   

8.
In statistical modelling, the effects of single-nucleotide polymorphisms (SNPs) are often regarded as time-independent. However, for traits recorded repeatedly, it is very interesting to investigate the behaviour of gene effects over time. In the analysis, simulated data from the 13th QTL-MAS Workshop (Wageningen, The Netherlands, April 2009) was used and the major goal was the modelling of genetic effects as time-dependent. For this purpose, a mixed model which describes each effect using the third-order Legendre orthogonal polynomials, in order to account for the correlation between consecutive measurements, is fitted. In this model, SNPs are modelled as fixed, while the environment is modelled as random effects. The maximum likelihood estimates of model parameters are obtained by the expectation–maximisation (EM) algorithm and the significance of the additive SNP effects is based on the likelihood ratio test, with p-values corrected for multiple testing. For each significant SNP, the percentage of the total variance contributed by this SNP is calculated. Moreover, by using a model which simultaneously incorporates effects of all of the SNPs, the prediction of future yields is conducted. As a result, 179 from the total of 453 SNPs covering 16 out of 18 true quantitative trait loci (QTL) were selected. The correlation between predicted and true breeding values was 0.73 for the data set with all SNPs and 0.84 for the data set with selected SNPs. In conclusion, we showed that a longitudinal approach allows for estimating changes of the variance contributed by each SNP over time and demonstrated that, for prediction, the pre-selection of SNPs plays an important role.  相似文献   

9.
Schizophrenia is a complex psychiatric disorder characterized by positive symptoms, negative symptoms, and cognitive impairment. MAGI2, a relatively large gene (~1.5 Mbps) that maps to chromosome 7q21, is involved in recruitment of neurotransmitter receptors such as AMPA- and NMDA-type glutamate receptors. A genetic association study designed to evaluate the association between MAGI2 and cognitive performance or schizophrenia has not been conducted. In this case-control study, we examined the relationship of single nucleotide polymorphism (SNP) variations in MAGI2 and risk for schizophrenia in a large Japanese sample and explored the potential relationships between variations in MAGI2 and aspects of human cognitive function related to glutamate activity. Based on the result of first schizophrenia genome-wide association study in a Japanese population (JGWAS), we selected four independent SNPs and performed an association study using a large independent Japanese sample set (cases 1624, controls 1621). Wisconsin Card Sorting Test (WCST) was used to evaluate executive function in 114 cases and 91 controls. We found suggestive evidence for genetic association of common SNPs within MAGI2 locus and schizophrenia in Japanese population. Furthermore in terms of association between MAGI2 and cognitive performance, we observed that genotype effect of rs2190665 on WCST score was significant (p?=?0.034) and rs4729938 trended toward significance (p?=?0.08). In conclusion, although we could not detect strong genetic evidence for association of common variants in MAGI2 and increased schizophrenia risk in a Japanese population, these SNPs may increase risk of cognitive impairment in schizophrenic patients.  相似文献   

10.
Most non-significant individual single nucleotide polymorphisms (SNPs) were undiscovered in hypertension association studies. Their possible SNP–SNP interactions were usually ignored and leaded to missing heritability. In present study, we proposed a particle swarm optimization (PSO) algorithm to analyze the SNP–SNP interaction associated with hypertension. Genotype dataset of eight SNPs of renin-angiotensin system genes for 130 non-hypertension and 313 hypertension subjects were included. Without SNP–SNP interaction, most individual SNPs were non-significant difference between the hypertension and non-hypertension groups. For SNP–SNP interaction, PSO can select the SNP combinations involving different SNP numbers, namely the best SNP barcodes, to show the maximum frequency difference between non-hypertension and hypertension groups. After computation, the best PSO-generated SNP barcodes were dominant in non-hypertension in terms of the occurrences of frequency differences between non-hypertension and hypertension groups. The OR values of the best SNP barcodes involving 2–8 SNPs were 0.705–0.334, suggesting that these SNP barcodes were protective against hypertension. In conclusion, this study demonstrated that non-significant SNPs may generate the joint effect in association study. Our proposed PSO algorithm is effective to identify the best protective SNP barcodes against hypertension.  相似文献   

11.
The multiple-SNP analysis has been studied by many researchers, in which the effects of multiple SNPs are simultaneously estimated and tested in a multiple linear regression. The multiple-SNP association analysis usually has higher power and lower false-positive rate for detecting causative SNP(s) than single marker analysis (SMA). Several methods have been proposed to simultaneously estimate and test multiple SNP effects. In this research, a fast method called MEML (Mixed model based Expectation-Maximization Lasso algorithm) was developed for simultaneously estimate of multiple SNP effects. An improved Lasso prior was assigned to SNP effects which were estimated by searching the maximum joint posterior mode. The residual polygenic effect was included in the model to absorb many tiny SNP effects, which is treated as missing data in our EM algorithm. A series of simulation experiments were conducted to validate the proposed method, and the results showed that compared with SMMA, the new method can dramatically decrease the false-positive rate. The new method was also applied to the 50k SNP-panel dataset for genome-wide association study of milk production traits in Chinese Holstein cattle. Totally, 39 significant SNPs and their nearby 25 genes were found. The number of significant SNPs is remarkably fewer than that by SMMA which found 105 significant SNPs. Among 39 significant SNPs, 8 were also found by SMMA and several well-known QTLs or genes were confirmed again; furthermore, we also got some positional candidate gene with potential function of effecting milk production traits. These novel findings in our research should be valuable for further investigation.  相似文献   

12.
We investigated the RGS4 as a susceptibility gene for schizophrenia in Chinese Han (184 trios and 138 sibling pairs, a total of 322 families) and Scottish (580 cases and 620 controls) populations using both a family trio and case-control design. Both the samples had statistical power greater than 70% to detect a heterozygote genotype relative risk of >1.2 for frequent RGS4-risk alleles. We genotyped four single nucleotide polymorphisms (SNPs) which have previously been associated with schizophrenia as either individually or part of haplotypes. Allele frequencies and linkage disequilibrium between the SNPs was similar in the two populations. In the Chinese sample, no individual SNPs or any of their haplotypes were associated with schizophrenia. In the Scottish population, one SNP (SNP7) was significantly over-represented in the cases compared with the controls (0.44 vs. 0.38; A allele; chi(2) 7.08, P = 0.011 after correction for correlation between markers by permutation testing). One two-marker haplotype, composed of alleles T and A of SNP4 and SNP7, respectively, showed individual significance after correction by permutation testing (chi(2) 6.8; P = 0.04). None of the full four-marker haplotypes showed association, including the G-G-G-G haplotype previously associated with schizophrenia in more than one sample and the A-T-A-A haplotype. Thus, our data do not directly replicate previous associations of RGS4, but association with SNP 7 in the Scottish population provides some support for a role in schizophrenia susceptibility. We cannot conclusively exclude RGS4, as associated haplotypes are likely to be surrogates for unknown causative alleles, whose relationship with overlying haplotypes may differ between the population groups. Differences in the association seen across the two populations could result from methodological factors such as diagnostic differences but most likely result from ethnic differences in haplotype structures within RGS4.  相似文献   

13.
Testing for genetic effects on mean values of a quantitative trait has been a very successful strategy. However, most studies to date have not explored genetic effects on the variance of quantitative traits as a relevant consequence of genetic variation. In this report, we demonstrate that, under plausible scenarios of genetic interaction, the variance of a quantitative trait is expected to differ among the three possible genotypes of a biallelic SNP. Leveraging this observation with Levene''s test of equality of variance, we propose a novel method to prioritize SNPs for subsequent gene–gene and gene–environment testing. This method has the advantageous characteristic that the interacting covariate need not be known or measured for a SNP to be prioritized. Using simulations, we show that this method has increased power over exhaustive search under certain conditions. We further investigate the utility of variance per genotype by examining data from the Women''s Genome Health Study. Using this dataset, we identify new interactions between the LEPR SNP rs12753193 and body mass index in the prediction of C-reactive protein levels, between the ICAM1 SNP rs1799969 and smoking in the prediction of soluble ICAM-1 levels, and between the PNPLA3 SNP rs738409 and body mass index in the prediction of soluble ICAM-1 levels. These results demonstrate the utility of our approach and provide novel genetic insight into the relationship among obesity, smoking, and inflammation.  相似文献   

14.
15.
BackgroundDNA prediction of eye color represent one application of the externally visible characteristics (EVC), which attained growing interest in the field of DNA forensic phenotyping. This is mainly due to its ability to narrow the pool of suspects without the need to compare any retrieved DNA material from the crime scene to a reference DNA. Several methods and multiplex genetic panel were proposed with variable prediction accuracy between different populations. However, such panel was not previously tested in the Saudi population, nor any populations of the Middle East and North Africa origin.MethodA panel of eleven single nucleotide polymorphisms (SNPs) was tested for their association with three eye colors (brown, hazel, and intermediate) in 80 volunteer Saudi individuals. SNPs and haplotype association test with eye colors were performed to identify the top significant SNPs with the three eye colors. Also, multinomial logistic regression was used to construct the prediction model using a training set of 60 subjects, and a validation set of 20 subjects. The goodness of fit parameter of the model to correctly predicts each eye color as compared to the other was performed.ResultsEye color was significantly associated with rs12913832, rs7170852, and rs916977 that are located within HERC2. SNP rs12913832 was the top significant SNP (p-value = 1.78E?15) that accounted for the association in this region, as the other SNPs were not significant after adjusting for rs12913832. A prediction model containing five SNPs showed high prediction accuracy with Area Under the receiver operating characteristic Curves (AUC) equals to 0.95 and 0.83 for brown and intermediate eye colors, respectively. However, the model’s performance was very low for predicting the hazel eye color with AUC equals 0.75.DiscussionDespite the small sample size of our study, we reported very significant SNP associations with eye color. Our model to predict eye colors based on DNA material showed high accuracy for brown and intermediate eye colors. The eye color prediction-model underperformed for the hazel eye colors, suggesting that larger sample size, as well as more comprehensive set of SNPs, could improve the model-prediction accuracy.  相似文献   

16.
Highly parallel SNP genotyping platforms have been developed for some important crop species, but these platforms typically carry a high cost per sample for first-time or small-scale users. In contrast, recently developed genotyping by sequencing (GBS) approaches offer a highly cost effective alternative for simultaneous SNP discovery and genotyping. In the present investigation, we have explored the use of GBS in soybean. In addition to developing a novel analysis pipeline to call SNPs and indels from the resulting sequence reads, we have devised a modified library preparation protocol to alter the degree of complexity reduction. We used a set of eight diverse soybean genotypes to conduct a pilot scale test of the protocol and pipeline. Using ApeKI for GBS library preparation and sequencing on an Illumina GAIIx machine, we obtained 5.5 M reads and these were processed using our pipeline. A total of 10,120 high quality SNPs were obtained and the distribution of these SNPs mirrored closely the distribution of gene-rich regions in the soybean genome. A total of 39.5% of the SNPs were present in genic regions and 52.5% of these were located in the coding sequence. Validation of over 400 genotypes at a set of randomly selected SNPs using Sanger sequencing showed a 98% success rate. We then explored the use of selective primers to achieve a greater complexity reduction during GBS library preparation. The number of SNP calls could be increased by almost 40% and their depth of coverage was more than doubled, thus opening the door to an increase in the throughput and a significant decrease in the per sample cost. The approach to obtain high quality SNPs developed here will be helpful for marker assisted genomics as well as assessment of available genetic resources for effective utilisation in a wide number of species.  相似文献   

17.
Currently, single-nucleotide polymorphisms (SNPs) with minor allele frequency (MAF) of >5% are preferentially used in case-control association studies of common human diseases. Recent technological developments enable inexpensive and accurate genotyping of a large number of SNPs in thousands of cases and controls, which can provide adequate statistical power to analyze SNPs with MAF <5%. Our purpose was to determine whether evaluating rare SNPs in case-control association studies could help identify causal SNPs for common diseases. We suggest that slightly deleterious SNPs (sdSNPs) subjected to weak purifying selection are major players in genetic control of susceptibility to common diseases. We compared the distribution of MAFs of synonymous SNPs with that of nonsynonymous SNPs (1) predicted to be benign, (2) predicted to be possibly damaging, and (3) predicted to be probably damaging by PolyPhen. Our sources of data were the International HapMap Project, ENCODE, and the SeattleSNPs project. We found that the MAF distribution of possibly and probably damaging SNPs was shifted toward rare SNPs compared with the MAF distribution of benign and synonymous SNPs that are not likely to be functional. We also found an inverse relationship between MAF and the proportion of nsSNPs predicted to be protein disturbing. On the basis of this relationship, we estimated the joint probability that a SNP is functional and would be detected as significant in a case-control study. Our analysis suggests that including rare SNPs in genotyping platforms will advance identification of causal SNPs in case-control association studies, particularly as sample sizes increase.  相似文献   

18.
A new method for SNP analysis based on the detection of pyrophosphate (PPi) is demonstrated, which is capable of detecting small allele frequency differences between two DNA pools for genetic association studies other than SNP typing. The method is based on specific primer extension reactions coupled with PPi detection. As the specificity of the primer-directed extension is not enough for quantitative SNP analysis, artificial mismatched bases are introduced into the 3′-terminal regions of the specific primers as a way of improving the switching characteristics of the primer extension reactions. The best position in the primer for such artificial mismatched bases is the third position from the primer 3′-terminus. Contamination with endogenous PPi, which produces a large background signal level in SNP analysis, was removed using PPase to degrade the PPi during the sample preparation process. It is possible to accurately and quantitatively analyze SNPs using a set of primers that correspond to the wild-type and mutant DNA segments. The termini of these primers are at the mutation positions. Various types of SNPs were successfully analyzed. It was possible to very accurately determine SNPs with frequencies as low 0.02. It is very reproducible and the allele frequency difference can be determined. It is accurate enough to detect meaningful genetic differences among pooled DNA samples. The method is sensitive enough to detect 14 amol ssM13 DNA. The proposed method seems very promising in terms of realizing a cost-effective, large-scale human genetic testing system.  相似文献   

19.
To optimize the strategies for population-based pharmacogenetic studies, we extensively analyzed single-nucleotide polymorphisms (SNPs) and haplotypes in 199 drug-related genes, through use of 4,190 SNPs in 752 control subjects. Drug-related genes, like other genes, have a haplotype-block structure, and a few haplotype-tagging SNPs (htSNPs) could represent most of the major haplotypes constructed with common SNPs in a block. Because our data included 860 uncommon (frequency <0.1) SNPs with frequencies that were accurately estimated, we analyzed the relationship between haplotypes and uncommon SNPs within the blocks (549 SNPs). We inferred haplotype frequencies through use of the data from all htSNPs and one of the uncommon SNPs within a block and calculated four joint probabilities for the haplotypes. We show that, irrespective of the minor-allele frequency of an uncommon SNP, the majority (mean +/- SD frequency 0.943+/-0.117) of the minor alleles were assigned to a single haplotype tagged by htSNPs if the uncommon SNP was within the block. These results support the hypothesis that recombinations occur only infrequently within blocks. The proportion of a single haplotype tagged by htSNPs to which the minor alleles of an uncommon SNP were assigned was positively correlated with the minor-allele frequency when the frequency was <0.03 (P<.000001; n=233 [Spearman's rank correlation coefficient]). The results of simulation studies suggested that haplotype analysis using htSNPs may be useful in the detection of uncommon SNPs associated with phenotypes if the frequencies of the SNPs are higher in affected than in control populations, the SNPs are within the blocks, and the frequencies of the SNPs are >0.03.  相似文献   

20.
Han F  Pan W 《Biometrics》2012,68(1):307-315
Many statistical tests have been proposed for case-control data to detect disease association with multiple single nucleotide polymorphisms (SNPs) in linkage disequilibrium. The main reason for the existence of so many tests is that each test aims to detect one or two aspects of many possible distributional differences between cases and controls, largely due to the lack of a general and yet simple model for discrete genotype data. Here we propose a latent variable model to represent SNP data: the observed SNP data are assumed to be obtained by discretizing a latent multivariate Gaussian variate. Because the latent variate is multivariate Gaussian, its distribution is completely characterized by its mean vector and covariance matrix, in contrast to much more complex forms of a general distribution for discrete multivariate SNP data. We propose a composite likelihood approach for parameter estimation. A direct application of this latent variable model is to association testing with multiple SNPs in a candidate gene or region. In contrast to many existing tests that aim to detect only one or two aspects of many possible distributional differences of discrete SNP data, we can exclusively focus on testing the mean and covariance parameters of the latent Gaussian distributions for cases and controls. Our simulation results demonstrate potential power gains of the proposed approach over some existing methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号