首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 890 毫秒
1.
Cancer is a complex genetic disease, resulting from defects of multiple genes. Development of microarray techniques makes it possible to survey the whole genome and detect genes that have influential impacts on the progression of cancer. Statistical analysis of cancer microarray data is challenging because of the high dimensionality and cluster nature of gene expressions. Here, clusters are composed of genes with coordinated pathological functions and/or correlated expressions. In this article, we consider cancer studies where censored survival endpoint is measured along with microarray gene expressions. We propose a hybrid clustering approach, which uses both pathological pathway information retrieved from KEGG and statistical correlations of gene expressions, to construct gene clusters. Cancer survival time is modeled as a linear function of gene expressions. We adopt the clustering threshold gradient directed regularization (CTGDR) method for simultaneous gene cluster selection, within-cluster gene selection, and predictive model building. Analysis of two lymphoma studies shows that the proposed approach - which is composed of the hybrid gene clustering, linear regression model for survival, and clustering regularized estimation with CTGDR - can effectively identify gene clusters and genes within selected clusters that have satisfactory predictive power for censored cancer survival outcomes.  相似文献   

2.
We derive a multivariate survival model for age of onset data of a sibship from an additive genetic gamma frailty model constructed basing on the inheritance vectors, and investigate the properties of this model. Based on this model, we propose a retrospective likelihood approach for genetic linkage analysis using sibship data. This test is an allele-sharing-based test, and does not require specification of genetic models or the penetrance functions. This new approach can incorporate both affected and unaffected sibs, environmental covariates and age of onset or age at censoring information and, therefore, provides a practical solution for mapping genes for complex diseases with variable age of onset. Small simulation study indicates that the proposed method performs better than the commonly used allele-sharing-based methods for linkage analysis, especially when the population disease rate is high. We applied this method to a type 1 diabetes sib pair data set and a small breast cancer data set. Both simulated and real data sets also indicate that the method is relatively robust to the misspecification to the baseline hazard function.  相似文献   

3.
BackgroundA modeling method was developed to estimate recurrence-free survival using cancer registry survival data. This study aims to validate the modeled recurrence-free survival against “gold-standard” estimates from data collected by the National Program of Cancer Registries (NPCR) Patient-Centered Outcomes Research (PCOR) project.MethodsWe compared 5-year metastatic recurrence-free survival using modeling and empirical estimates from the PCOR project that collected disease-free status, tumor progression and recurrence for colorectal and female breast cancer cases diagnosed in 2011 in 5 U.S. state registries. To estimate empirical recurrence-free survival, we developed an algorithm that combined disease-free, recurrence, progression, and date information from NPCR-PCOR data. We applied the modeling method to relative survival for patients diagnosed with female breast and colorectal cancer in 2000–2015 in the SEER-18 areas.ResultsWhen grouping patients with stages I-III, the 5-year metastatic recurrence-free modeled and NPCR-PCOR estimates are very similar being respectively, 90.2 % and 88.6 % for female breast cancer, 74.6 % and 75.3 % for colon cancer, and 68.8 % and 68.5 % for rectum cancer. In general, the 5-year recurrence-free NPCR-PCOR and modeled estimates are still similar when controlling by stage. The modeled estimates, however, are not as accurate for recurrence-free survival in years 1–3 from diagnosis.ConclusionsThe alignment between NPCR-PCOR and modeled estimates supports their validity and provides robust population-based estimates of 5-year metastatic recurrence-free survival for female breast, colon, and rectum cancers. The modeling approach can in principle be extended to other cancer sites to provide provisional population-based estimates of 5-year recurrence free survival.  相似文献   

4.
We explore a hierarchical generalized latent factor model for discrete and bounded response variables and in particular, binomial responses. Specifically, we develop a novel two-step estimation procedure and the corresponding statistical inference that is computationally efficient and scalable for the high dimension in terms of both the number of subjects and the number of features per subject. We also establish the validity of the estimation procedure, particularly the asymptotic properties of the estimated effect size and the latent structure, as well as the estimated number of latent factors. The results are corroborated by a simulation study and for illustration, the proposed methodology is applied to analyze a dataset in a gene–environment association study.  相似文献   

5.

Background  

Prognosis is of critical interest in breast cancer research. Biomedical studies suggest that genomic measurements may have independent predictive power for prognosis. Gene profiling studies have been conducted to search for predictive genomic measurements. Genes have the inherent pathway structure, where pathways are composed of multiple genes with coordinated functions. The goal of this study is to identify gene pathways with predictive power for breast cancer prognosis. Since our goal is fundamentally different from that of existing studies, a new pathway analysis method is proposed.  相似文献   

6.
7.

Background

Breast cancer is the most common type of invasive cancer in woman. It accounts for approximately 18% of all cancer deaths worldwide. It is well known that somatic mutation plays an essential role in cancer development. Hence, we propose that a prognostic prediction model that integrates somatic mutations with gene expression can improve survival prediction for cancer patients and also be able to reveal the genetic mutations associated with survival.

Method

Differential expression analysis was used to identify breast cancer related genes. Genetic algorithm (GA) and univariate Cox regression analysis were applied to filter out survival related genes. DAVID was used for enrichment analysis on somatic mutated gene set. The performance of survival predictors were assessed by Cox regression model and concordance index(C-index).

Results

We investigated the genome-wide gene expression profile and somatic mutations of 1091 breast invasive carcinoma cases from The Cancer Genome Atlas (TCGA). We identified 118 genes with high hazard ratios as breast cancer survival risk gene candidates (log rank p?<? 0.0001 and c-index?=?0.636). Multiple breast cancer survival related genes were found in this gene set, including FOXR2, FOXD1, MTNR1B and SDC1. Further genetic algorithm (GA) revealed an optimal gene set consisted of 88 genes with higher c-index (log rank p?<? 0.0001 and c-index?=?0.656). We validated this gene set on an independent breast cancer data set and achieved a similar performance (log rank p?<? 0.0001 and c-index?=?0.614). Moreover, we revealed 25 functional annotations, 15 gene ontology terms and 14 pathways that were significantly enriched in the genes that showed distinct mutation patterns in the different survival risk groups. These functional gene sets were used as new features for the survival prediction model. In particular, our results suggested that the Fanconi anemia pathway had an important role in breast cancer prognosis.

Conclusions

Our study indicated that the expression levels of the gene signatures remain the effective indicators for breast cancer survival prediction. Combining the gene expression information with other types of features derived from somatic mutations can further improve the performance of survival prediction. The pathways that were associated with survival risk suggested by our study can be further investigated for improving cancer patient survival.
  相似文献   

8.
BackgroundStudies show that thousands of genes are associated with prognosis of breast cancer. Towards utilizing available genetic data, efforts have been made to predict outcomes using gene expression data, and a number of commercial products have been developed. These products have the following shortcomings: 1) They use the Cox model for prediction. However, the RSF model has been shown to significantly outperform the Cox model. 2) Testing was not done to see if a complete set of clinical predictors could predict as well as the gene expression signatures.Methodology/FindingsWe address these shortcomings. The METABRIC data set concerns 1981 breast cancer tumors. Features include 21 clinical features, expression levels for 16,384 genes, and survival. We compare the survival prediction performance of the Cox model and the RSF model using the clinical data and the gene expression data to their performance using only the clinical data. We obtain significantly better results when we used both clinical data and gene expression data for 5 year, 10 year, and 15 year survival prediction. When we replace the gene expression data by PAM50 subtype, our results are significant only for 5 year and 15 year prediction. We obtain significantly better results using the RSF model over the Cox model. Finally, our results indicate that gene expression data alone may predict long-term survival.Conclusions/SignificanceOur results indicate that we can obtain improved survival prediction using clinical data and gene expression data compared to prediction using only clinical data. We further conclude that we can obtain improved survival prediction using the RSF model instead of the Cox model. These results are significant because by incorporating more gene expression data with clinical features and using the RSF model, we could develop decision support systems that better utilize heterogeneous information to improve outcome prediction and decision making.  相似文献   

9.
There is a growing body of data reporting the association of genetic alterations in chromosome 9P21 with the risk of developing cancer. In the current study, we studied the association of a genetic variant in CDKN2A/B, rs1333049, with the risk of developing breast cancer. A total of 339 participants with and without breast cancer entered to the study. Genotyping was done by the TaqMan real-time polymerase chain reaction (RT-PCR) method and gene expression analysis was ran by RT-PCR. Our data showed that the minor allele homozygote in the total population was 10%, whereas for heterozygote was 38%. The dominant genetic model demonstrated that individuals with breast cancer had advanced TNM classification. Moreover, the logistic regression revealed that individuals who had CC/CG genotypes might have an enhanced risk of developing breast cancer when compared to the holders of GG genotype (e.g., OR = 2.8; 95% CI,1.4–5.4; p = .001), after regulated for confounders; age and body mass index. Furthermore, our analysis showed that the CDKN2A/B gene was downregulated in patients (p < .001). We showed a meaningful relationship of CDKN2A/B with the risk of breast cancer, cancer, showing the importance of studies in great sample size and several centers for studying the value of the marker as a risk classification in the management of patients with breast cancer.  相似文献   

10.
11.
Many late-phase clinical trials recruit subjects at multiple study sites. This introduces a hierarchical structure into the data that can result in a power-loss compared to a more homogeneous single-center trial. Building on a recently proposed approach to sample size determination, we suggest a sample size recalculation procedure for multicenter trials with continuous endpoints. The procedure estimates nuisance parameters at interim from noncomparative data and recalculates the sample size required based on these estimates. In contrast to other sample size calculation methods for multicenter trials, our approach assumes a mixed effects model and does not rely on balanced data within centers. It is therefore advantageous, especially for sample size recalculation at interim. We illustrate the proposed methodology by a study evaluating a diabetes management system. Monte Carlo simulations are carried out to evaluate operation characteristics of the sample size recalculation procedure using comparative as well as noncomparative data, assessing their dependence on parameters such as between-center heterogeneity, residual variance of observations, treatment effect size and number of centers. We compare two different estimators for between-center heterogeneity, an unadjusted and a bias-adjusted estimator, both based on quadratic forms. The type 1 error probability as well as statistical power are close to their nominal levels for all parameter combinations considered in our simulation study for the proposed unadjusted estimator, whereas the adjusted estimator exhibits some type 1 error rate inflation. Overall, the sample size recalculation procedure can be recommended to mitigate risks arising from misspecified nuisance parameters at the planning stage.  相似文献   

12.
BackgroundPopulation-based cancer registry (PBCR) data provide crucial information for evaluating the effectiveness of cancer services and reflect prospects for cure by estimating population-based cancer survival. This study provides long-term trends in survival among patients diagnosed with cancer in the Barretos region (São Paulo State, Brazil).MethodsIn this population-based study, we estimated the one- and five-year age-standardized net survival rates of 13,246 patients diagnosed with 24 different cancer types in Barretos region between 2000 and 2018. The results were presented by sex, time since diagnosis, disease stage, and period of diagnosis.ResultsMarked differences in the one- and five-year age-standardized net survival rates were observed across the cancer sites. Pancreatic cancer had the lowest 5-year net survival (5.5 %, 95 %CI: 2.9–9.4) followed by oesophageal cancer (5.6 %, 95 %CI: 3.0–9.4), while prostate cancer ranked the best (92.1 %, 95 %CI: 87.8–94.9), followed by thyroid cancer (87.4 %, 95 %CI: 69.9–95.1) and female breast cancer (78.3 %, 95 %CI: 74.5–81.6). The survival rates differed substantially according to sex and clinical stage. Comparing the first (2000–2005) and last (2012–2018) periods, cancer survival improved, especially for thyroid, leukemia, and pharyngeal cancers, with differences of 34.4 %, 29.0 %, and 28.7 %, respectively.ConclusionTo our knowledge, this is the first study to evaluate long-term cancer survival in the Barretos region, showing an overall improvement over the last two decades. Survival varied by site, indicating the need for multiple cancer control actions in the future with a lower burden of cancer.  相似文献   

13.
14.

Background

A tremendous amount of efforts have been devoted to identifying genes for diagnosis and prognosis of diseases using microarray gene expression data. It has been demonstrated that gene expression data have cluster structure, where the clusters consist of co-regulated genes which tend to have coordinated functions. However, most available statistical methods for gene selection do not take into consideration the cluster structure.

Results

We propose a supervised group Lasso approach that takes into account the cluster structure in gene expression data for gene selection and predictive model building. For gene expression data without biological cluster information, we first divide genes into clusters using the K-means approach and determine the optimal number of clusters using the Gap method. The supervised group Lasso consists of two steps. In the first step, we identify important genes within each cluster using the Lasso method. In the second step, we select important clusters using the group Lasso. Tuning parameters are determined using V-fold cross validation at both steps to allow for further flexibility. Prediction performance is evaluated using leave-one-out cross validation. We apply the proposed method to disease classification and survival analysis with microarray data.

Conclusion

We analyze four microarray data sets using the proposed approach: two cancer data sets with binary cancer occurrence as outcomes and two lymphoma data sets with survival outcomes. The results show that the proposed approach is capable of identifying a small number of influential gene clusters and important genes within those clusters, and has better prediction performance than existing methods.  相似文献   

15.
Missing data are ubiquitous in clinical and social research, and multiple imputation (MI) is increasingly the methodology of choice for practitioners. Two principal strategies for imputation have been proposed in the literature: joint modelling multiple imputation (JM‐MI) and full conditional specification multiple imputation (FCS‐MI). While JM‐MI is arguably a preferable approach, because it involves specification of an explicit imputation model, FCS‐MI is pragmatically appealing, because of its flexibility in handling different types of variables. JM‐MI has developed from the multivariate normal model, and latent normal variables have been proposed as a natural way to extend this model to handle categorical variables. In this article, we evaluate the latent normal model through an extensive simulation study and an application on data from the German Breast Cancer Study Group, comparing the results with FCS‐MI. We divide our investigation in four sections, focusing on (i) binary, (ii) categorical, (iii) ordinal, and (iv) count data. Using data simulated from both the latent normal model and the general location model, we find that in all but one extreme general location model setting JM‐MI works very well, and sometimes outperforms FCS‐MI. We conclude the latent normal model, implemented in the R package jomo , can be used with confidence by researchers, both for single and multilevel multiple imputation.  相似文献   

16.
Breast cancer is one of the most deadly forms of cancer in women worldwide. Better prediction of breast cancer prognosis is essential for more personalized treatment. In this study, we aimed to infer patient‐specific subpathway activities to reveal a functional signature associated with the prognosis of patients with breast cancer. We integrated pathway structure with gene expression data to construct patient‐specific subpathway activity profiles using a greedy search algorithm. A four‐subpathway prognostic signature was developed in the training set using a random forest supervised classification algorithm and a prognostic score model with the activity profiles. According to the signature, patients were classified into high‐risk and low‐risk groups with significantly different overall survival in the training set (median survival of 65 vs 106 months, = 1.82e‐13) and test set (median survival of 75 vs 101 months, = 4.17e‐5). Our signature was then applied to five independent breast cancer data sets and showed similar prognostic values, confirming the accuracy and robustness of the subpathway signature. Stratified analysis suggested that the four‐subpathway signature had prognostic value within subtypes of breast cancer. Our results suggest that the four‐subpathway signature may be a useful biomarker for breast cancer prognosis.  相似文献   

17.
Caspase 8 (CASP8) gene plays a key role in the regulation of apoptotic cell death. Expression variation in this gene has been associated with the risk of breast cancer. The aim of this study was to investigate the association of rs3834129 and rs3769821, as functional variants, and their haplotypes with molecular profile as well as the risk of breast cancer in an Iranian population. A case-control study was conducted on 812 participants including 293 breast cancer patients and 519 healthy controls. Genotyping was performed by polymerase chain reaction–based methods. Statistical analysis was performed using SPSS Ver16. The association between polymorphisms and haplotypes with the risk of breast cancer was estimated by calculating odds ratios (OR) and chi-square (χ2) tests. In comparison with ins allele (I) of rs3834129, carriers of del allele (D) showed a lower risk of breast cancer (OR, 0.65; 95% confidence interval [CI], 0.49-0.87; P = 0.004). The multivariate logistic regression model indicated DD genotype as an independent factor for a decreased risk of breast cancer in our population (OR, 0.18; 95% CI, 0.06-0.58; P = 0.004). Also, the C allele of rs3769821 was associated with a 43% increased risk of breast cancer (P = 0.005); however, after adjustment for confounding factors, no association with rs3769821 and breast cancer was observed. In addition, D-T haplotype and diplotype presented protective effects (P < 0.05). Our results indicate that genetic variations in the promoter region of CASP8 gene, especially rs3834129, may serve as a genetic risk factor for breast cancer in an Iranian population.  相似文献   

18.
Interleukin-6 (IL-6) is a cytokine involved in different physiologic and pathophysiologic processes including carcinogenesis. In 2003, a single nucleotide polymorphism (−174G/C) of the IL-6 gene promoter has been linked to breast cancer prognosis in node-positive (N+) breast cancer patients. Since, different studies have led to conflicting conclusions about its role as a prognostic and/or diagnostic marker. The primary aim of our study was to investigate the link between −174G/C polymorphism and breast cancer risk on the one hand, and −174G/C polymorphism and prognosis in different groups of patients: sporadic N+ breast cancers (n = 138), sporadic N− breast cancers (n = 95) and familial breast cancer (n = 60) on the other hand. The variables of interest were disease-free survival and overall survival. The secondary aim of the study was to screen IL-6 gene promoter using direct sequencing to identify new polymorphisms in our French Caucasian breast cancer population. No association or trend of association between −174G/C polymorphism of IL-6 gene promoter gene and breast cancer diagnosis or prognosis was shown, even in meta-analyses. Furthermore, we have identified four novel polymorphic sites in the IL-6 gene promoter region: −764G → A, −757C → T, −233T → A, 15C → A.  相似文献   

19.
A novel feature screening method is proposed to examine the correlation between latent responses and potential predictors in ultrahigh-dimensional data analysis. First, a confirmatory factor analysis (CFA) model is used to characterize latent responses through multiple observed variables. The expectation-maximization algorithm is employed to estimate the parameters in the CFA model. Second, R-Vector (RV) correlation is used to measure the dependence between the multivariate latent responses and covariates of interest. Third, a feature screening procedure is proposed on the basis of an unbiased estimator of the RV coefficient. The sure screening property of the proposed screening procedure is established under certain mild conditions. Monte Carlo simulations are conducted to assess the finite-sample performance of the feature screening procedure. The proposed method is applied to an investigation of the relationship between psychological well-being and the human genome.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号