首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 687 毫秒
1.
2.
One important problem in genomic research is to identify genomic features such as gene expression data or DNA single nucleotide polymorphisms (SNPs) that are related to clinical phenotypes. Often these genomic data can be naturally divided into biologically meaningful groups such as genes belonging to the same pathways or SNPs within genes. In this paper, we propose group additive regression models and a group gradient descent boosting procedure for identifying groups of genomic features that are related to clinical phenotypes. Our simulation results show that by dividing the variables into appropriate groups, we can obtain better identification of the group features that are related to the phenotypes. In addition, the prediction mean square errors are also smaller than the component-wise boosting procedure. We demonstrate the application of the methods to pathway-based analysis of microarray gene expression data of breast cancer. Results from analysis of a breast cancer microarray gene expression data set indicate that the pathways of metalloendopeptidases (MMPs) and MMP inhibitors, as well as cell proliferation, cell growth, and maintenance are important to breast cancer-specific survival.  相似文献   

3.
Hokeun Sun  Hongzhe Li 《Biometrics》2012,68(4):1197-1206
Summary Gaussian graphical models have been widely used as an effective method for studying the conditional independency structure among genes and for constructing genetic networks. However, gene expression data typically have heavier tails or more outlying observations than the standard Gaussian distribution. Such outliers in gene expression data can lead to wrong inference on the dependency structure among the genes. We propose a l1 penalized estimation procedure for the sparse Gaussian graphical models that is robustified against possible outliers. The likelihood function is weighted according to how the observation is deviated, where the deviation of the observation is measured based on its own likelihood. An efficient computational algorithm based on the coordinate gradient descent method is developed to obtain the minimizer of the negative penalized robustified‐likelihood, where nonzero elements of the concentration matrix represents the graphical links among the genes. After the graphical structure is obtained, we re‐estimate the positive definite concentration matrix using an iterative proportional fitting algorithm. Through simulations, we demonstrate that the proposed robust method performs much better than the graphical Lasso for the Gaussian graphical models in terms of both graph structure selection and estimation when outliers are present. We apply the robust estimation procedure to an analysis of yeast gene expression data and show that the resulting graph has better biological interpretation than that obtained from the graphical Lasso.  相似文献   

4.
Plant domestication has led to considerable phenotypic modifications from wild species to modern varieties. However, although changes in key traits have been well documented, less is known about the underlying molecular mechanisms, such as the reduction of molecular diversity or global gene co‐expression patterns. In this study, we used a combination of gene expression and population genetics in wild and crop tomato to decipher the footprints of domestication. We found a set of 1729 differentially expressed genes (DEG) between the two genetic groups, belonging to 17 clusters of co‐expressed DEG, suggesting that domestication affected not only individual genes but also regulatory networks. Five co‐expression clusters were enriched in functional terms involving carbohydrate metabolism or epigenetic regulation of gene expression. We detected differences in nucleotide diversity between the crop and wild groups specific to DEG. Our study provides an extensive profiling of the rewiring of gene co‐expression induced by the domestication syndrome in one of the main crop species.  相似文献   

5.
We introduce a new method for detection of recombination hotspots from population genetic data. This method is based on (a) defining an (approximate) penalized likelihood for how recombination rate varies with physical position and (b) maximizing this penalized likelihood over possible sets of recombination hotspots. Simulation results suggest that this is a more powerful method for detection of hotspots than are existing methods. We apply the method to data from 89 genes sequenced in African American and European American populations. We find many genes with multiple hotspots, and some hotspots show evidence of being population-specific. Our results suggest that hotspots are randomly positioned within genes and could be as frequent as one per 30 kb.  相似文献   

6.
Alzheimer's disease (AD) is the leading cause of dementia in the elderly. Because the pathological changes underlying this disease can begin decades prior to the onset of cognitive impairment, identifying the earliest events in the AD pathological cascade has critical implications for both the diagnosis and treatment of this disease. We previously reported that compared to autopsy confirmed healthy control brain, expression of LR11 (or SorLA) is markedly reduced in AD brain as well as in a subset of people with mild cognitive impairment (MCI), a prodromal clinical stage of AD. Recent studies of the LR11 gene SORL1 have suggested that the association between SORL1 single nucleotide polymorphisms (SNPs) and AD risk may not be universal. Therefore, we sought to confirm our earlier findings in a population chosen solely based on clinical criteria, as in most genetic studies. Quantitative immunohistochemistry was used to measure LR11 expression in 43 cases from the Religious Orders Study that were chosen based on a final pre-mortem clinical diagnosis of MCI, mild/moderate AD or no cognitive impairment (NCI). LR11 expression was highly variable in all three diagnostic groups, with no significant group differences. Low LR11 cases were identified using the lowest tertile of LR11 expression observed across all cases as a threshold. Contrary to previous reports, low LR11 expression was found in only 29% of AD cases. A similar proportion of both the MCI and NCI cases also displayed low LR11 expression. AD-associated lesions were present in the majority of cases regardless of diagnostic group, although we found no association between LR11 levels and pathological variables. These findings suggest that the relationship between LR11 expression and the development of AD may be more complicated than originally believed.  相似文献   

7.
Johnson BA  Long Q  Chung M 《Biometrics》2011,67(4):1379-1388
Summary Dimension reduction, model and variable selection are ubiquitous concepts in modern statistical science and deriving new methods beyond the scope of current methodology is noteworthy. This article briefly reviews existing regularization methods for penalized least squares and likelihood for survival data and their extension to a certain class of penalized estimating function. We show that if one's goal is to estimate the entire regularized coefficient path using the observed survival data, then all current strategies fail for the Buckley–James estimating function. We propose a novel two‐stage method to estimate and restore the entire Dantzig‐regularized coefficient path for censored outcomes in a least‐squares framework. We apply our methods to a microarray study of lung andenocarcinoma with sample size n = 200 and p = 1036 gene predictors and find 10 genes that are consistently selected across different criteria and an additional 14 genes that merit further investigation. In simulation studies, we found that the proposed path restoration and variable selection technique has the potential to perform as well as existing methods that begin with a proper convex loss function at the outset.  相似文献   

8.
9.
10.
11.
The popularity of penalized regression in high‐dimensional data analysis has led to a demand for new inferential tools for these models. False discovery rate control is widely used in high‐dimensional hypothesis testing, but has only recently been considered in the context of penalized regression. Almost all of this work, however, has focused on lasso‐penalized linear regression. In this paper, we derive a general method for controlling the marginal false discovery rate that can be applied to any penalized likelihood‐based model, such as logistic regression and Cox regression. Our approach is fast, flexible and can be used with a variety of penalty functions including lasso, elastic net, MCP, and MNet. We derive theoretical results under which the proposed method is valid, and use simulation studies to demonstrate that the approach is reasonably robust, albeit slightly conservative, when these assumptions are violated. Despite being conservative, we show that our method often offers more power to select causally important features than existing approaches. Finally, the practical utility of the method is demonstrated on gene expression datasets with binary and time‐to‐event outcomes.  相似文献   

12.
13.
MOTIVATION: An important application of microarray technology is to relate gene expression profiles to various clinical phenotypes of patients. Success has been demonstrated in molecular classification of cancer in which the gene expression data serve as predictors and different types of cancer serve as a categorical outcome variable. However, there has been less research in linking gene expression profiles to the censored survival data such as patients' overall survival time or time to cancer relapse. It would be desirable to have models with good prediction accuracy and parsimony property. RESULTS: We propose to use the L(1) penalized estimation for the Cox model to select genes that are relevant to patients' survival and to build a predictive model for future prediction. The computational difficulty associated with the estimation in the high-dimensional and low-sample size settings can be efficiently solved by using the recently developed least-angle regression (LARS) method. Our simulation studies and application to real datasets on predicting survival after chemotherapy for patients with diffuse large B-cell lymphoma demonstrate that the proposed procedure, which we call the LARS-Cox procedure, can be used for identifying important genes that are related to time to death due to cancer and for building a parsimonious model for predicting the survival of future patients. The LARS-Cox regression gives better predictive performance than the L(2) penalized regression and a few other dimension-reduction based methods. CONCLUSIONS: We conclude that the proposed LARS-Cox procedure can be very useful in identifying genes relevant to survival phenotypes and in building a parsimonious predictive model that can be used for classifying future patients into clinically relevant high- and low-risk groups based on the gene expression profile and survival times of previous patients.  相似文献   

14.
15.
16.
17.
Hong F  Li H 《Biometrics》2006,62(2):534-544
Time-course studies of gene expression are essential in biomedical research to understand biological phenomena that evolve in a temporal fashion. We introduce a functional hierarchical model for detecting temporally differentially expressed (TDE) genes between two experimental conditions for cross-sectional designs, where the gene expression profiles are treated as functional data and modeled by basis function expansions. A Monte Carlo EM algorithm was developed for estimating both the gene-specific parameters and the hyperparameters in the second level of modeling. We use a direct posterior probability approach to bound the rate of false discovery at a pre-specified level and evaluate the methods by simulations and application to microarray time-course gene expression data on Caenorhabditis elegans developmental processes. Simulation results suggested that the procedure performs better than the two-way ANOVA in identifying TDE genes, resulting in both higher sensitivity and specificity. Genes identified from the C. elegans developmental data set show clear patterns of changes between the two experimental conditions.  相似文献   

18.
19.
Previously, we have determined the nonhost‐mediated recognition of the MfAvr4 and MfEcp2 effector proteins from the banana pathogen Mycosphaerella fijiensis in tomato, by the cognate Cf‐4 and Cf‐Ecp2 resistance proteins, respectively. These two resistance proteins could thus mediate resistance against M. fijiensis if genetically transformed into banana (Musa spp.). However, disease resistance controlled by single dominant genes can be overcome by mutated effector alleles, whose products are not recognized by the cognate resistance proteins. Here, we surveyed the allelic variation within the MfAvr4, MfEcp2, MfEcp2‐2 and MfEcp2‐3 effector genes of M. fijiensis in a global population of the pathogen, and assayed its impact on recognition by the tomato Cf‐4 and Cf‐Ecp2 resistance proteins, respectively. We identified a large number of polymorphisms that could reflect a co‐evolutionary arms race between host and pathogen. The analysis of nucleotide substitution patterns suggests that both positive selection and intragenic recombination have shaped the evolution of M. fijiensis effectors. Clear differences in allelic diversity were observed between strains originating from South‐East Asia relative to strains from other banana‐producing continents, consistent with the hypothesis that M. fijiensis originated in the Asian‐Pacific region. Furthermore, transient co‐expression of the MfAvr4 effector alleles and the tomato Cf‐4 resistance gene, as well as of MfEcp2, MfEcp2‐2 and MfEcp2‐3 and the putative Cf‐Ecp2 resistance gene, indicated that effector alleles able to overcome these resistance genes are already present in natural populations of the pathogen, thus questioning the durability of resistance that can be provided by these genes in the field.  相似文献   

20.
To explore the relevance of rat liver regeneration (LR) to acute hepatic failure (AHF), Rat Genome 230 2.0 Array was used to detect their gene expression profiles in this study, and the reliability of the detection results was confirmed by real-time-PCR. 1012 genes were found to be significantly changed in AHF occurrence and 948 genes in LR. Hierarchical clustering analysis showed that physiological activities of AHF and those of LR had no time correlation. Hierarchical clustering analysis (which is performed to group genes based on the similarity of expression patterns) showed that physiological activities of AHF and those of LR had no time correlation. K-means clustering analysis (which is used to check the difference in the relevant predictor variables between different groups is significant or not) demonstrated that gene expression trend of C1 group (genes relate to categories of stimulus–response and cell apoptosis, etc.) in AHF and in LR was extremely similar, that those of their C2 group (categories of regulation of homeostasis and hormone stimulation, etc.) were contrary, and that those of their C3 (material and energy metabolism and oxidation reduction, etc.), C4 (Cell cycle-related genes) and C5 (cell proliferation-related genes) groups were also similar with the gene expression changes of LR more abundant. GO classifications and functional clustering analysis (which was used to statistics the numbers or composition of proteins or genes at a function level) revealed that cellular processes including immune response, inflammatory reaction, cell migration and adhesion, etc. were increased both in AHF and in LR, whereas material and energy metabolism were decreased. Of them, stimulus response, inflammatory reaction and regulation of apoptosis, etc. were stronger in AHF occurrence than in LR, but ion homeostasis, hormonal response, regulation of cell division and proliferation, etc. were weaker in AHF occurrence. Gene expression changes and physiological activities of AHF and those of LR not only have similarities but also differences.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号