首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
For Genetic Analysis Workshop 19, 2 extensive data sets were provided, including whole genome and whole exome sequence data, gene expression data, and longitudinal blood pressure outcomes, together with nongenetic covariates. These data sets gave researchers the chance to investigate different aspects of more complex relationships within the data, and the contributions in our working group focused on statistical methods for the joint analysis of multiple phenotypes, which is part of the research field of data integration. The analysis of data from different sources poses challenges to researchers but provides the opportunity to model the real-life situation more realistically.Our 4 contributions all used the provided real data to identify genetic predictors for blood pressure. In the contributions, novel multivariate rare variant tests, copula models, structural equation models and a sparse matrix representation variable selection approach were applied. Each of these statistical models can be used to investigate specific hypothesized relationships, which are described together with their biological assumptions.The results showed that all methods are ready for application on a genome-wide scale and can be used or extended to include multiple omics data sets. The results provide potentially interesting genetic targets for future investigation and replication. Furthermore, all contributions demonstrated that the analysis of complex data sets could benefit from modeling correlated phenotypes jointly as well as by adding further bioinformatics information.  相似文献   

2.
Technological advances facilitating the acquisition of large arrays of biomarker data have led to new opportunities to understand and characterize disease progression over time. This creates an analytical challenge, however, due to the large numbers of potentially informative markers, the high degrees of correlation among them, and the time-dependent trajectories of association. We propose a mixed ridge estimator, which integrates ridge regression into the mixed effects modeling framework in order to account for both the correlation induced by repeatedly measuring an outcome on each individual over time, as well as the potentially high degree of correlation among possible predictor variables. An expectation-maximization algorithm is described to account for unknown variance and covariance parameters. Model performance is demonstrated through a simulation study and an application of the mixed ridge approach to data arising from a study of cardiometabolic biomarker responses to evoked inflammation induced by experimental low-dose endotoxemia.  相似文献   

3.
Dunson B  Baird DD 《Biometrics》2002,58(4):813-822
In the absence of longitudinal data, the current presence and severity of disease can be measured for a sample of individuals to investigate factors related to disease incidence and progression. In this article, Bayesian discrete-time stochastic models are developed for inference from cross-sectional data consisting of the age at first diagnosis, the current presence of disease, and one or more surrogates of disease severity. Semiparametric models are used for the age-specific hazards of onset and diagnosis, and a normal underlying variable approach is proposed for modeling of changes with latency time in disease severity. The model accommodates multiple surrogates of disease severity having different measurement scales and heterogeneity among individuals in disease progression. A Markov chain Monte Carlo algorithm is described for posterior computation, and the methods are applied to data from a study of uterine leiomyoma.  相似文献   

4.
Leeyoung Park  Ju H. Kim 《Genetics》2015,199(4):1007-1016
Causal models including genetic factors are important for understanding the presentation mechanisms of complex diseases. Familial aggregation and segregation analyses based on polygenic threshold models have been the primary approach to fitting genetic models to the family data of complex diseases. In the current study, an advanced approach to obtaining appropriate causal models for complex diseases based on the sufficient component cause (SCC) model involving combinations of traditional genetics principles was proposed. The probabilities for the entire population, i.e., normal–normal, normal–disease, and disease–disease, were considered for each model for the appropriate handling of common complex diseases. The causal model in the current study included the genetic effects from single genes involving epistasis, complementary gene interactions, gene–environment interactions, and environmental effects. Bayesian inference using a Markov chain Monte Carlo algorithm (MCMC) was used to assess of the proportions of each component for a given population lifetime incidence. This approach is flexible, allowing both common and rare variants within a gene and across multiple genes. An application to schizophrenia data confirmed the complexity of the causal factors. An analysis of diabetes data demonstrated that environmental factors and gene–environment interactions are the main causal factors for type II diabetes. The proposed method is effective and useful for identifying causal models, which can accelerate the development of efficient strategies for identifying causal factors of complex diseases.  相似文献   

5.
Increases in throughput and decreases in costs have facilitated large scale metabolomics studies, the simultaneous measurement of large numbers of biochemical components in biological samples. Initial large scale studies focused on biomarker discovery for disease or disease progression and helped to understand biochemical pathways underlying disease. The first population-based studies that combined metabolomics and genome wide association studies (mGWAS) have increased our understanding of the (genetic) regulation of biochemical conversions. Measurements of metabolites as intermediate phenotypes are a potentially very powerful approach to uncover how genetic variation affects disease susceptibility and progression. However, we still face many hurdles in the interpretation of mGWAS data. Due to the composite nature of many metabolites, single enzymes may affect the levels of multiple metabolites and, conversely, levels of single metabolites may be affected by multiple enzymes. Here, we will provide a global review of the current status of mGWAS. We will specifically discuss the application of prior biological knowledge present in databases to the interpretation of mGWAS results and discuss the potential of mathematical models. As the technology continuously improves to detect metabolites and to measure genetic variation, it is clear that comprehensive systems biology based approaches are required to further our insight in the association between genes, metabolites and disease. This article is part of a Special Issue entitled: From Genome to Function.  相似文献   

6.
We are studying variable selection in multiple regression models in which molecular markers and/or gene-expression measurements as well as intensity measurements from protein spectra serve as predictors for the outcome variable (i.e., trait or disease state). Finding genetic biomarkers and searching genetic–epidemiological factors can be formulated as a statistical problem of variable selection, in which, from a large set of candidates, a small number of trait-associated predictors are identified. We illustrate our approach by analyzing the data available for chronic fatigue syndrome (CFS). CFS is a complex disease from several aspects, e.g., it is difficult to diagnose and difficult to quantify. To identify biomarkers we used microarray data and SELDI-TOF-based proteomics data. We also analyzed genetic marker information for a large number of SNPs for an overlapping set of individuals. The objectives of the analyses were to identify markers specific to fatigue that are also possibly exclusive to CFS. The use of such models can be motivated, for example, by the search for new biomarkers for the diagnosis and prognosis of cancer and measures of response to therapy. Generally, for this we use Bayesian hierarchical modeling and Markov Chain Monte Carlo computation.  相似文献   

7.
Gianola D  van Kaam JB 《Genetics》2008,178(4):2289-2303
Reproducing kernel Hilbert spaces regression procedures for prediction of total genetic value for quantitative traits, which make use of phenotypic and genomic data simultaneously, are discussed from a theoretical perspective. It is argued that a nonparametric treatment may be needed for capturing the multiple and complex interactions potentially arising in whole-genome models, i.e., those based on thousands of single-nucleotide polymorphism (SNP) markers. After a review of reproducing kernel Hilbert spaces regression, it is shown that the statistical specification admits a standard mixed-effects linear model representation, with smoothing parameters treated as variance components. Models for capturing different forms of interaction, e.g., chromosome-specific, are presented. Implementations can be carried out using software for likelihood-based or Bayesian inference.  相似文献   

8.
Zhao JX  Foulkes AS  George EI 《Biometrics》2005,61(2):591-599
Characterizing the process by which molecular and cellular level changes occur over time will have broad implications for clinical decision making and help further our knowledge of disease etiology across many complex diseases. However, this presents an analytic challenge due to the large number of potentially relevant biomarkers and the complex, uncharacterized relationships among them. We propose an exploratory Bayesian model selection procedure that searches for model simplicity through independence testing of multiple discrete biomarkers measured over time. Bayes factor calculations are used to identify and compare models that are best supported by the data. For large model spaces, i.e., a large number of multi-leveled biomarkers, we propose a Markov chain Monte Carlo (MCMC) stochastic search algorithm for finding promising models. We apply our procedure to explore the extent to which HIV-1 genetic changes occur independently over time.  相似文献   

9.
Yi N  Xu S  Allison DB 《Genetics》2003,165(2):867-883
Most complex traits of animals, plants, and humans are influenced by multiple genetic and environmental factors. Interactions among multiple genes play fundamental roles in the genetic control and evolution of complex traits. Statistical modeling of interaction effects in quantitative trait loci (QTL) analysis must accommodate a very large number of potential genetic effects, which presents a major challenge to determining the genetic model with respect to the number of QTL, their positions, and their genetic effects. In this study, we use the methodology of Bayesian model and variable selection to develop strategies for identifying multiple QTL with complex epistatic patterns in experimental designs with two segregating genotypes. Specifically, we develop a reversible jump Markov chain Monte Carlo algorithm to determine the number of QTL and to select main and epistatic effects. With the proposed method, we can jointly infer the genetic model of a complex trait and the associated genetic parameters, including the number, positions, and main and epistatic effects of the identified QTL. Our method can map a large number of QTL with any combination of main and epistatic effects. Utility and flexibility of the method are demonstrated using both simulated data and a real data set. Sensitivity of posterior inference to prior specifications of the number and genetic effects of QTL is investigated.  相似文献   

10.
Estimating genetic parameters in natural populations using the "animal model"   总被引:24,自引:0,他引:24  
Estimating the genetic basis of quantitative traits can be tricky for wild populations in natural environments, as environmental variation frequently obscures the underlying evolutionary patterns. I review the recent application of restricted maximum-likelihood "animal models" to multigenerational data from natural populations, and show how the estimation of variance components and prediction of breeding values using these methods offer a powerful means of tackling the potentially confounding effects of environmental variation, as well as generating a wealth of new areas of investigation.  相似文献   

11.
12.
Multi-state models are a flexible tool for analyzing complex time-to-event problems with multiple endpoints. Compared to the Cox regression model with a single endpoint or a summarizing composite endpoint, they can provide a more detailed insight into the disease process. Furthermore, prognosis can be improved by including information from intermediate events occurring during the course of the disease. Different model variants, options and additional assumptions provide many possibilities, but at the same time complicate the implementation of multi-state techniques. So far, no guiding literature is available to specify a multi-state model systematically. The objective of this work was to set up a general specification procedure for an illness-death model that optimizes the model fit and predictive accuracy by stepwise reduction of the model. As an application example, we reanalyzed data from an observational study of 434 ovarian cancer patients with progression as intermediate and death as absorbing state. The technique is described in general terms and can be applied to other illness-death models without recovery. The clock-reset approach was used, implicating that the time was reset to zero after progression. The non-homogeneous semi-Markov characteristic stated that the present time as well as the time between surgery and progression influenced survival after progression. Covariate effects on transitions were estimated and proportionality of transition baseline hazards was tested. The finally developed model optimized the accuracy of predictions for two simulated patients. This stepwise procedure yields parsimonious but targeted multi-state models with well interpretable coefficients and optimized predictive ability, even for smaller data sets.  相似文献   

13.
A better understanding of disease progression is beneficial for early diagnosis and appropriate individual therapy. Many different approaches for statistical modelling of cumulative disease progression have been proposed in the literature, including simple path models up to complex restricted Bayesian networks. Important fields of application are diseases such as cancer and HIV. Tumour progression is measured by means of chromosome aberrations, whereas people infected with HIV develop drug resistances because of genetic changes of the HI‐virus. These two very different diseases have typical courses of disease progression, which can be modelled partly by consecutive and partly by independent steps. This paper gives an overview of the different progression models and points out their advantages and drawbacks. Different models are compared via simulations to analyse how they work if some of their assumptions are violated. In a simulation study, we evaluate how models perform in terms of fitting induced multivariate probability distributions and topological relationships. We often find that the true model class used for generating data is outperformed by either a less or a more complex model class. The more flexible conjunctive Bayesian networks can be used to fit oncogenetic trees, whereas mixtures of oncogenetic trees with three tree components can be well fitted by mixture models with only two tree components.  相似文献   

14.
The interplay between C-C chemokine receptor type 5 (CCR5) host genetic background, disease progression, and intrahost HIV-1 evolutionary dynamics remains unclear because differences in viral evolution between hosts limit the ability to draw conclusions across hosts stratified into clinically relevant populations. Similar inference problems are proliferating across many measurably evolving pathogens for which intrahost sequence samples are readily available. To this end, we propose novel hierarchical phylogenetic models (HPMs) that incorporate fixed effects to test for differences in dynamics across host populations in a formal statistical framework employing stochastic search variable selection and model averaging. To clarify the role of CCR5 host genetic background and disease progression on viral evolutionary patterns, we obtain gp120 envelope sequences from clonal HIV-1 variants isolated at multiple time points in the course of infection from populations of HIV-1-infected individuals who only harbored CCR5-using HIV-1 variants at all time points. Presence or absence of a CCR5 wt/Δ32 genotype and progressive or long-term nonprogressive course of infection stratify the clinical populations in a two-way design. As compared with the standard approach of analyzing sequences from each patient independently, the HPM provides more efficient estimation of evolutionary parameters such as nucleotide substitution rates and d(N)/d(S) rate ratios, as shown by significant shrinkage of the estimator variance. The fixed effects also correct for nonindependence of data between populations and results in even further shrinkage of individual patient estimates. Model selection suggests an association between nucleotide substitution rate and disease progression, but a role for CCR5 genotype remains elusive. Given the absence of clear d(N)/d(S) differences between patient groups, delayed onset of AIDS symptoms appears to be solely associated with lower viral replication rates rather than with differences in selection on amino acid fixation.  相似文献   

15.
Soluble factors with inhibitory activity against type 1 Human Immunodeficiency Virus The pathogenesis of HIV-1 infection is a complex process that depends on multiple factors, including viral and host immune and genetic characteristics. This leads to a variable pattern of disease progression among those HIV-1-exposed individuals who become infected, while there are a number of individuals who remain healthy and HIV-1 seronegative despite being serially exposed to HIV-1. These variable outcomes of HIV-1 exposure suggest that there are mechanisms of natural resistance to HIV-1 infection. Although several genetic and adaptive immune mechanisms of resistance have been reported in some exposed seronegative and long-term non-progressor individuals, the mechanisms involved in controlling the establishment and progression of HIV-1 infection are not fully understood. Several soluble factors, such as defensins, chemokines, interferons and ribonucleases, among others, produced by cells of the immune system and epithelial tissues, have a broad anti-viral activity that might play a role as protective mechanisms during HIV-1 exposure. A better understanding of the mechanisms and role of these soluble factors during the natural resistance to HIV-1 infection may have important implications for the design of novel therapeutic strategies to combat the morbidity and mortality associated with the HIV-1 pandemic.  相似文献   

16.
Many genetic loci and SNPs associated with many common complex human diseases and traits are now identified. The total genetic variance explained by these loci for a trait or disease, however, has often been very small. Much of the "missing heritability" has been revealed to be hidden in the genome among the large number of variants with small effects. Several recent studies have reported the presence of multiple independent SNPs and genetic heterogeneity in trait-associated loci. It is therefore reasonable to speculate that such a phenomenon could be common among loci known to be associated with a complex trait or disease. For testing this hypothesis, a total of 117 loci known to be associated with rheumatoid arthritis (RA), Crohn disease (CD), type 1 diabetes (T1D), or type 2 diabetes (T2D) were selected. The presence of multiple independent effects was assessed in the case-control samples genotyped by the Wellcome Trust Case Control Consortium study and imputed with SNP genotype information from the HapMap Project and the 1000 Genomes Project. Eleven loci with evidence of multiple independent effects were identified in the study, and the number was expected to increase at larger sample sizes and improved statistical power. The variance explained by the multiple effects in a locus was much higher than the variance explained by the single reported SNP effect. The results thus significantly improve our understanding of the allelic structure of these individual disease-associated loci, as well as our knowledge of the general genetic mechanisms of common complex traits and diseases.  相似文献   

17.
Anti-cancer drugs targeted to specific oncogenic pathways have shown promising therapeutic results in the past few years; however, drug resistance remains an important obstacle for these therapies. Resistance to these drugs can emerge due to a variety of reasons including genetic or epigenetic changes which alter the binding site of the drug target, cellular metabolism or export mechanisms. Obtaining a better understanding of the evolution of resistant populations during therapy may enable the design of more effective therapeutic regimens which prevent or delay progression of disease due to resistance. In this paper, we use stochastic mathematical models to study the evolutionary dynamics of resistance under time-varying dosing schedules and pharmacokinetic effects. The populations of sensitive and resistant cells are modeled as multi-type non-homogeneous birth-death processes in which the drug concentration affects the birth and death rates of both the sensitive and resistant cell populations in continuous time. This flexible model allows us to consider the effects of generalized treatment strategies as well as detailed pharmacokinetic phenomena such as drug elimination and accumulation over multiple doses. We develop estimates for the probability of developing resistance and moments of the size of the resistant cell population. With these estimates, we optimize treatment schedules over a subspace of tolerated schedules to minimize the risk of disease progression due to resistance as well as locate ideal schedules for controlling the population size of resistant clones in situations where resistance is inevitable. Our methodology can be used to describe dynamics of resistance arising due to a single (epi)genetic alteration in any tumor type.  相似文献   

18.
Yang X  Belin TR  Boscardin WJ 《Biometrics》2005,61(2):498-506
Across multiply imputed data sets, variable selection methods such as stepwise regression and other criterion-based strategies that include or exclude particular variables typically result in models with different selected predictors, thus presenting a problem for combining the results from separate complete-data analyses. Here, drawing on a Bayesian framework, we propose two alternative strategies to address the problem of choosing among linear regression models when there are missing covariates. One approach, which we call "impute, then select" (ITS) involves initially performing multiple imputation and then applying Bayesian variable selection to the multiply imputed data sets. A second strategy is to conduct Bayesian variable selection and missing data imputation simultaneously within one Gibbs sampling process, which we call "simultaneously impute and select" (SIAS). The methods are implemented and evaluated using the Bayesian procedure known as stochastic search variable selection for multivariate normal data sets, but both strategies offer general frameworks within which different Bayesian variable selection algorithms could be used for other types of data sets. A study of mental health services utilization among children in foster care programs is used to illustrate the techniques. Simulation studies show that both ITS and SIAS outperform complete-case analysis with stepwise variable selection and that SIAS slightly outperforms ITS.  相似文献   

19.
Analysis of molecular data promises identification of biomarkers for improving prognostic models, thus potentially enabling better patient management. For identifying such biomarkers, risk prediction models can be employed that link high-dimensional molecular covariate data to a clinical endpoint. In low-dimensional settings, a multitude of statistical techniques already exists for building such models, e.g. allowing for variable selection or for quantifying the added value of a new biomarker. We provide an overview of techniques for regularized estimation that transfer this toward high-dimensional settings, with a focus on models for time-to-event endpoints. Techniques for incorporating specific covariate structure are discussed, as well as techniques for dealing with more complex endpoints. Employing gene expression data from patients with diffuse large B-cell lymphoma, some typical modeling issues from low-dimensional settings are illustrated in a high-dimensional application. First, the performance of classical stepwise regression is compared to stage-wise regression, as implemented by a component-wise likelihood-based boosting approach. A second issues arises, when artificially transforming the response into a binary variable. The effects of the resulting loss of efficiency and potential bias in a high-dimensional setting are illustrated, and a link to competing risks models is provided. Finally, we discuss conditions for adequately quantifying the added value of high-dimensional gene expression measurements, both at the stage of model fitting and when performing evaluation.  相似文献   

20.
It is generally accepted that most plant populations are locally adapted. Yet, understanding how environmental forces give rise to adaptive genetic variation is a challenge in conservation genetics and crucial to the preservation of species under rapidly changing climatic conditions. Environmental variation, phylogeographic history, and population demographic processes all contribute to spatially structured genetic variation, however few current models attempt to separate these confounding effects. To illustrate the benefits of using a spatially-explicit model for identifying potentially adaptive loci, we compared outlier locus detection methods with a recently-developed landscape genetic approach. We analyzed 157 loci from samples of the alpine herb Gentiana nivalis collected across the European Alps. Principle coordinates of neighbor matrices (PCNM), eigenvectors that quantify multi-scale spatial variation present in a data set, were incorporated into a landscape genetic approach relating AFLP frequencies with 23 environmental variables. Four major findings emerged. 1) Fifteen loci were significantly correlated with at least one predictor variable (R adj 2  > 0.5). 2) Models including PCNM variables identified eight more potentially adaptive loci than models run without spatial variables. 3) When compared to outlier detection methods, the landscape genetic approach detected four of the same loci plus 11 additional loci. 4) Temperature, precipitation, and solar radiation were the three major environmental factors driving potentially adaptive genetic variation in G. nivalis. Techniques presented in this paper offer an efficient method for identifying potentially adaptive genetic variation and associated environmental forces of selection, providing an important step forward for the conservation of non-model species under global change.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号