首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
To search the entire human genome for association is a novel and promising approach to unravelling the genetic basis of complex genetic diseases. In these genome-wide association studies (GWAs), several hundreds of thousands of single nucleotide polymorphisms (SNPs) are analyzed at the same time, posing substantial biostatistical and computational challenges. In this paper, we discuss a number of biostatistical aspects of GWAs in detail. We specifically consider quality control issues and show that signal intensity plots are a sine qua condition non in today's GWAs. Approaches to detect and adjust for population stratification are briefly examined. We discuss different strategies aimed at tackling the problem of multiple testing, including adjustment of p -values, the false positive report probability and the false discovery rate. Another aspect of GWAs requiring special attention is the search for gene-gene and gene-environment interactions. We finally describe multistage approaches to GWAs.  相似文献   

2.
DNA suffers from a wide range of damage, both from extracellular agents and via endogenous mechanisms. Damage of DNA can lead to cancer and other diseases. Therefore, it is plausible that sequence variants in DNA repair genes are involved in cancer development. A recent systematic review and meta-analysis, based on the "Venice criteria", showed that out of 241 associations investigated, only three resulted to have a strong grade of cumulative evidence. These associations were: two SNPs rs1799793 and rs13181 in the ERCC2 gene and lung cancer (recessive model) and rs1805794 in the NBN gene and bladder cancer (dominant model). An update of this meta-analysis has been performed in the present paper, and we found partially inconsistent results. Inconsistencies in the literature are thus far not easy to explain. In addition, none of the cancer genome-wide association studies (GWAs) published so far showed highly statistically significant associations for any of the common DNA repair gene variants, in such a way as to place DNA repair genes among the top 10-20 hits identified in GWAs. Though this suggests that it is unlikely that DNA repair gene polymorphisms per se play a major role, a clarification of the discrepancies in the literature is needed. Also, gene/environment and gene/lifestyle interactions for the carcinogenic mechanisms involving DNA repair should be investigated more systematically and with less classification error. Finally, the combined effect of multiple SNPs in several genes in one or more relevant DNA repair pathways could have a greater impact on pathological phenotypes than SNPs in single genes, but this has been investigated only occasionally.  相似文献   

3.

Objective

Genome wide association studies (GWAs) of breast cancer mortality have identified few potential associations. The concordance between these studies is unclear. In this study, we used a meta-analysis of two prognostic GWAs and a replication cohort to identify the strongest associations and to evaluate the loci suggested in previous studies. We attempt to identify those SNPs which could impact overall survival irrespective of the age of onset.

Methods

To facilitate the meta-analysis and to refine the association signals, SNPs were imputed using data from the 1000 genomes project. Cox-proportional hazard models were used to estimate hazard ratios (HR) in 536 patients from the POSH cohort (Prospective study of Outcomes in Sporadic versus Hereditary breast cancer) and 805 patients from the HEBCS cohort (Helsinki Breast Cancer Study). These hazard ratios were combined using a Mantel-Haenszel fixed effects meta-analysis and a p-value threshold of 5×10−8 was used to determine significance. Replication was performed in 1523 additional patients from the POSH study.

Results

Although no SNPs achieved genome wide significance, three SNPs have significant association in the replication cohort and combined p-values less than 5.6×10−6. These SNPs are; rs421379 which is 556 kb upstream of ARRDC3 (HR = 1.49, 95% confidence interval (CI) = 1.27–1.75, P = 1.1×10−6), rs12358475 which is between ECHDC3 and PROSER2 (HR = 0.75, CI = 0.67–0.85, P = 1.8×10−6), and rs1728400 which is between LINC00917 and FOXF1.

Conclusions

In a genome wide meta-analysis of two independent cohorts from UK and Finland, we identified potential associations at three distinct loci. Phenotypic heterogeneity and relatively small sample sizes may explain the lack of genome wide significant findings. However, the replication at three SNPs in the validation cohort shows promise for future studies in larger cohorts. We did not find strong evidence for concordance between the few associations highlighted by previous GWAs of breast cancer survival and this study.  相似文献   

4.
Recent genome-wide association studies (GWAs) have identified several new genetic risk factors for asthma; however, their influence on disease behavior and treatment response is still unclear. The aim of our study was the association analysis of the most significant single nucleotide polymorphisms (SNPs) recently reported by GWAs in different phenotypes of childhood asthma and analysis of correlation between these SNPs and clinical parameters. We have genotyped 288 children with asthma and 276 healthy controls. We provided here first replication of bivariate associations between CA10 (p?=?0.001) and SGK493 (p?=?0.011) with asthma. In addition, we have identified new correlation between SNPs in CA10, SGK493, and CTNNA3 with asthma behavior and glucocorticoid treatment response. Asthma patients who carried G allele in SNP rs967676 in gene CA10 were associated with more pronounced airway obstruction, higher bronchial hyper-reactivity, and increased inflammation. Higher bronchial hyper-reactivity was also associated with C allele in SNP rs1440095 in gene SGK493 but only in nonatopic asthmatics. In addition, we found that patients who carried at least one T allele in SNP rs1786929 in CTNNA3 (p?=?0.022) and atopic patients who carried at least one G allele in SNP rs967676 in gene CA10 (p?=?0.034) had higher increase in pulmonary function after glucocorticoid therapy. Our results suggest genetic heterogeneity between atopic and nonatopic asthma. We provided further evidence that treatment response in childhood asthma is genetically predisposed, and we report here two novel SNPs in genes CA10 and CTNNA3 as potential pharmacogenetic biomarkers that could be used in personalized treatment in childhood asthma.  相似文献   

5.

Background  

Single Nucleotide Polymorphisms (SNPs) are the most common type of polymorphisms found in the human genome. Effective genetic association studies require the identification of sets of tag SNPs that capture as much haplotype information as possible. Tag SNP selection is analogous to the problem of data compression in information theory. According to Shannon's framework, the optimal tag set maximizes the entropy of the tag SNPs subject to constraints on the number of SNPs. This approach requires an appropriate probabilistic model. Compared to simple measures of Linkage Disequilibrium (LD), a good model of haplotype sequences can more accurately account for LD structure. It also provides a machinery for the prediction of tagged SNPs and thereby to assess the performances of tag sets through their ability to predict larger SNP sets.  相似文献   

6.
Motivated by the absolute risk predictions required in medical decision making and patient counseling, we propose an approach for the combined analysis of case-control and prospective studies of disease risk factors. The approach is hierarchical to account for parameter heterogeneity among studies and among sampling units of the same study. It is based on modeling the retrospective distribution of the covariates given the disease outcome, a strategy that greatly simplifies both the combination of prospective and retrospective studies and the computation of Bayesian predictions in the hierarchical case-control context. Retrospective modeling differentiates our approach from most current strategies for inference on risk factors, which are based on the assumption of a specific prospective model. To ensure modeling flexibility, we propose using a mixture model for the retrospective distributions of the covariates. This leads to a general nonlinear regression family for the implied prospective likelihood. After introducing and motivating our proposal, we present simple results that highlight its relationship with existing approaches, develop Markov chain Monte Carlo methods for inference and prediction, and present an illustration using ovarian cancer data.  相似文献   

7.
Robust assessment of genetic effects on quantitative traits or complex-disease risk requires synthesis of evidence from multiple studies. Frequently, studies have genotyped partially overlapping sets of SNPs within a gene or region of interest, hampering attempts to combine all the available data. By using the example of C-reactive protein (CRP) as a quantitative trait, we show how linkage disequilibrium in and around its gene facilitates use of Bayesian hierarchical models to integrate informative data from all available genetic association studies of this trait, irrespective of the SNP typed. A variable selection scheme, followed by contextualization of SNPs exhibiting independent associations within the haplotype structure of the gene, enhanced our ability to infer likely causal variants in this region with population-scale data. This strategy, based on data from a literature based systematic review and substantial new genotyping, facilitated the most comprehensive evaluation to date of the role of variants governing CRP levels, providing important information on the minimal subset of SNPs necessary for comprehensive evaluation of the likely causal relevance of elevated CRP levels for coronary-heart-disease risk by Mendelian randomization. The same method could be applied to evidence synthesis of other quantitative traits, whenever the typed SNPs vary among studies, and to assist fine mapping of causal variants.  相似文献   

8.
9.
《Genomics》2021,113(3):867-873
The efficacy of susceptible variants derived from genome-wide association studies (GWAs) optimizing discriminatory accuracy of colorectal cancer (CRC) in Chinese remains unclear. In the present validation study, we assessed 75 recently identified variants from GWAs. A risk predictive model combining 19 variants using the least absolute shrinkage and selection operator (LASSO) statistics offered certain clinical advantages. This model demonstrated an area under the receiver operating characteristic (AUC) of 0.61 during training analysis and yielded robust AUCs from 0.59 to 0.61 during validation analysis in three independent centers. The individuals carrying the highest quartile of risk score revealed over 2-fold risks of CRC (ranging from 2.12 to 2.90) compared with those who presented the lowest quartile of risk score. This genetic model offered the possibility of partitioning risk within the average risk population, which might serve as a first step toward developing individualized CRC prevention strategies in China.  相似文献   

10.
Summary .  A variety of flexible approaches have been proposed for functional data analysis, allowing both the mean curve and the distribution about the mean to be unknown. Such methods are most useful when there is limited prior information. Motivated by applications to modeling of temperature curves in the menstrual cycle, this article proposes a flexible approach for incorporating prior information in semiparametric Bayesian analyses of hierarchical functional data. The proposed approach is based on specifying the distribution of functions as a mixture of a parametric hierarchical model and a nonparametric contamination. The parametric component is chosen based on prior knowledge, while the contamination is characterized as a functional Dirichlet process. In the motivating application, the contamination component allows unanticipated curve shapes in unhealthy menstrual cycles. Methods are developed for posterior computation, and the approach is applied to data from a European fecundability study.  相似文献   

11.

Objective

Candidate gene association studies and genome-wide association studies (GWAs) have identified a large number of single nucleotide polymorphisms (SNPs) loci affecting susceptibility to rheumatoid arthritis (RA). However, for the same locus, some studies have yielded inconsistent results. To assess all the available evidence for association, we performed a meta-analysis on previously published case-control studies investigating the association between SNPs and RA.

Methods

Two hundred and sixteen studies, involving 125 SNPs, were reviewed. For each SNP, three genetic models were considered: the allele, dominant and recessive effects models. For each model, the effect summary odds ratio (OR) and 95% CIs were calculated. Cochran’s Q-statistics were used to assess heterogeneity. If the heterogeneity was high, a random effects model was used for meta-analysis, otherwise a fixed effects model was used.

Results

The meta-analysis results showed that: (1) 30, 28 and 26 SNPs were significantly associated with RA (P<0.01) for the allele, dominant, and recessive models, respectively. (2) rs2476601 (PTPN22) showed the strongest association for all the three models: OR = 1.605, 95% CI: 1.540–1.672, P<1.00E−15 for the T-allele; OR = 1.638, 95% CI: 1.565–1.714, P<1.00E−15 for the T/T+T/C genotype and OR = 2.544, 95% CI: 2.173–2.978, P<1.00E−15 for the T/T genotype. (3) Only 23 (18.4%), 13 (10.4%) and 15 (12.0%) SNPs had high heterogeneity (P<0.01) for the three models, respectively. (4) For some of the SNPs, there was no publication bias according to Funnel plots and Egger’s regression tests (P<0.01). For the other SNPs, the associations were tested in only a few studies, and may have been subject to publication bias. More studies on these loci are required.

Conclusion

Our meta-analysis provides a comprehensive evaluation of the RA association studies from the past two decades. The detailed meta-analysis results are available at: http://210.46.85.180/DRAP/index.php/Metaanalysis/index.  相似文献   

12.
Type 2 diabetes is a metabolic disease that profoundly affects energy homeostasis. The disease involves failure at several levels and subsystems and is characterized by insulin resistance in target cells and tissues (i.e. by impaired intracellular insulin signaling). We have previously used an iterative experimental-theoretical approach to unravel the early insulin signaling events in primary human adipocytes. That study, like most insulin signaling studies, is based on in vitro experimental examination of cells, and the in vivo relevance of such studies for human beings has not been systematically examined. Herein, we develop a hierarchical model of the adipose tissue, which links intracellular insulin control of glucose transport in human primary adipocytes with whole-body glucose homeostasis. An iterative approach between experiments and minimal modeling allowed us to conclude that it is not possible to scale up the experimentally determined glucose uptake by the isolated adipocytes to match the glucose uptake profile of the adipose tissue in vivo. However, a model that additionally includes insulin effects on blood flow in the adipose tissue and GLUT4 translocation due to cell handling can explain all data, but neither of these additions is sufficient independently. We also extend the minimal model to include hierarchical dynamic links to more detailed models (both to our own models and to those by others), which act as submodules that can be turned on or off. The resulting multilevel hierarchical model can merge detailed results on different subsystems into a coherent understanding of whole-body glucose homeostasis. This hierarchical modeling can potentially create bridges between other experimental model systems and the in vivo human situation and offers a framework for systematic evaluation of the physiological relevance of in vitro obtained molecular/cellular experimental data.  相似文献   

13.
While genome-era technologies focused on complete genome sequencing in various organisms, post-genome technologies aim at the understanding of the mechanisms of genetic information processing and elucidation of within-species variation. Single nucleotide polymorphisms (SNPs) are the most common source of genome variation in the human population. Nonsynonymous SNPs that occur in coding gene regions and result in amino acid substitutions are of particular interest. It is thought that such SNPs are responsible for phenotypic variation, quantitative traits, and the etiology of common diseases. PolyPhen is a computational tool for the prediction of putatively functional nonsynonymous SNPs by combining information of various types. The application areas of PolyPhen and similar methods include the genetics of complex diseases and congenital defects, the identification of functional mutations in model organisms, and evolutionary genetics.  相似文献   

14.

Background  

Whole genome association studies using highly dense single nucleotide polymorphisms (SNPs) are a set of methods to identify DNA markers associated with variation in a particular complex trait of interest. One of the main outcomes from these studies is a subset of statistically significant SNPs. Finding the potential biological functions of such SNPs can be an important step towards further use in human and agricultural populations (e.g., for identifying genes related to susceptibility to complex diseases or genes playing key roles in development or performance). The current challenge is that the information holding the clues to SNP functions is distributed across many different databases. Efficient bioinformatics tools are therefore needed to seamlessly integrate up-to-date functional information on SNPs. Many web services have arisen to meet the challenge but most work only within the framework of human medical research. Although we acknowledge the importance of human research, we identify there is a need for SNP annotation tools for other organisms.  相似文献   

15.
Elucidating the relationship between polymorphic sequences and risk of common disease is a challenge. For example, although it is clear that variation in DNA repair genes is associated with familial cancer, aging and neurological disease, progress toward identifying polymorphisms associated with elevated risk of sporadic disease has been slow. This is partly due to the complexity of the genetic variation, the existence of large numbers of mostly low frequency variants and the contribution of many genes to variation in susceptibility. There has been limited development of methods to find associations between genotypes having many polymorphisms and pathway function or health outcome. We have explored several statistical methods for identifying polymorphisms associated with variation in DNA repair phenotypes. The model system used was 80 cell lines that had been resequenced to identify variation; 191 single nucleotide substitution polymorphisms (SNPs) are included, of which 172 are in 31 base excision repair pathway genes, 19 in 5 anti-oxidation genes, and DNA repair phenotypes based on single strand breaks measured by the alkaline Comet assay. Univariate analyses were of limited value in identifying SNPs associated with phenotype variation. Of the multivariable model selection methods tested: the easiest that provided reduced error of prediction of phenotype was simple counting of the variant alleles predicted to encode proteins with reduced activity, which led to a genotype including 52 SNPs; the best and most parsimonious model was achieved using a two-step analysis without regard to potential functional relevance: first SNPs were ranked by importance determined by random forests regression (RFR), followed by cross-validation in a second round of RFR modeling that included ever more SNPs in declining order of importance. With this approach six SNPs were found to minimize prediction error. The results should encourage research into utilization of multivariate analytical methods for epidemiological studies of the association of genetic variation in complex genotypes with risk of common diseases.  相似文献   

16.
《Genomics》2020,112(5):3238-3246
Knowledge on population structure and genetic diversity is a focal point for association mapping studies and genomic selection. Genotyping by sequencing (GBS) represents an innovative method for large scale SNP detection and genotyping of genetic resources. Here we used the GBS approach for the genome-wide identification of SNPs in a collection of Cynoglossus semilaevis and for the assessment of the level of genetic diversity in C. semilaevis genotypes. GBS analysis generated a total of 55.12 Gb high-quality sequence data, with an average of 0.63 Gb per sample. The total number of SNP markers was 563, 109. In order to explore the genetic diversity of C. semilaevis and to select a minimal core set representing most of the total genetic variation with minimum redundancy, C. semilaevis sequences were analyzed using high quality SNPs. Based on hierarchical clustering, it was possible to divide the collection into 2 clusters. The marine fishing populations were clustered and clearly separated from the cultured populations, and the cultured populations from Hebei was also distinct from the other two local populations. These analyses showed that genotypes were clustered based on species-related features. Differential significant SNPs were also captured and validated by GBS and SNaPshot, with linkage disequilibrium and haplotype analysis, seven SNPs have been confirmed to have obvious differentiation in two populations, which may be used as the characteristic evaluation sites of sea-captured and cultured Cynoglossus semilaevis populations. And SNP markers and information on population structure developed in this study will undoubtedly support genome-wide association mapping studies and marker-assisted selection programs. These differential SNPs could be also employed as the characteristic evaluation sites of sea-captured and cultured Cynoglossus semilaevis populations in future.  相似文献   

17.
18.
Dominici F 《Biometrics》2000,56(2):546-553
We propose a methodology for estimating the cell probabilities in a multiway contingency table by combining partial information from a number of studies when not all of the variables are recorded in all studies. We jointly model the full set of categorical variables recorded in at least one of the studies, and we treat the variables that are not reported as missing dimensions of the study-specific contingency table. For example, we might be interested in combining several cohort studies in which the incidence in the exposed and nonexposed groups is not reported for all risk factors in all studies while the overall numbers of cases and cohort size is always available. To account for study-to-study variability, we adopt a Bayesian hierarchical model. At the first stage of the model, the observation stage, data are modeled by a multinomial distribution with fixed total number of observations. At the second stage, we use the logistic normal (LN) distribution to model variability in the study-specific cells' probabilities. Using this model and data augmentation techniques, we reconstruct the contingency table for each study regardless of which dimensions are missing, and we estimate population parameters of interest. Our hierarchical procedure borrows strength from all the studies and accounts for correlations among the cells' probabilities. The main difficulty in combining studies recording different variables is in maintaining a consistent interpretation of parameters across studies. The approach proposed here overcomes this difficulty and at the same time addresses the uncertainty arising from the missing dimensions. We apply our modeling strategy to analyze data on air pollution and mortality from 1987 to 1994 for six U.S. cities by combining six cross-classifications of low, medium, and high levels of mortality counts, particulate matter, ozone, and carbon monoxide with the complication that four of the six cities do not report all the air pollution variables. Our goals are to investigate the association between air pollution and mortality by reconstructing the tables with missing dimensions, to determine the most harmful pollutant combinations, and to make predictions about these key issues for a city other than the six sampled. We find that, for high levels of ozone and carbon monoxide, the number of cases with a high number of deaths increases as the levels of particulate matter, PM10, increases and that the most harmful combinations corresponds to high levels of PM10, confirming prior findings that levels of PM10 higher than the NAAQS standard are harmful.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号