首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
This paper is concerned with using multivariate binary observations to estimate the probabilities of unobserved classes with scientific meanings. We focus on the setting where additional information about sample similarities is available and represented by a rooted weighted tree. Every leaf in the given tree contains multiple samples. Shorter distances over the tree between the leaves indicate a priori higher similarity in class probability vectors. We propose a novel data integrative extension to classical latent class models with tree-structured shrinkage. The proposed approach enables (1) borrowing of information across leaves, (2) estimating data-driven leaf groups with distinct vectors of class probabilities, and (3) individual-level probabilistic class assignment given the observed multivariate binary measurements. We derive and implement a scalable posterior inference algorithm in a variational Bayes framework. Extensive simulations show more accurate estimation of class probabilities than alternatives that suboptimally use the additional sample similarity information. A zoonotic infectious disease application is used to illustrate the proposed approach. The paper concludes by a brief discussion on model limitations and extensions.  相似文献   

2.
Basket trials simultaneously evaluate the effect of one or more drugs on a defined biomarker, genetic alteration, or molecular target in a variety of disease subtypes, often called strata. A conventional approach for analyzing such trials is an independent analysis of each of the strata. This analysis is inefficient as it lacks the power to detect the effect of drugs in each stratum. To address these issues, various designs for basket trials have been proposed, centering on designs using Bayesian hierarchical models. In this article, we propose a novel Bayesian basket trial design that incorporates predictive sample size determination, early termination for inefficacy and efficacy, and the borrowing of information across strata. The borrowing of information is based on the similarity between the posterior distributions of the response probability. In general, Bayesian hierarchical models have many distributional assumptions along with multiple parameters. By contrast, our method has prior distributions for response probability and two parameters for similarity of distributions. The proposed design is easier to implement and less computationally demanding than other Bayesian basket designs. Through a simulation with various scenarios, our proposed design is compared with other designs including one that does not borrow information and one that uses a Bayesian hierarchical model.  相似文献   

3.
Bayesian hierarchical models have been applied in clinical trials to allow for information sharing across subgroups. Traditional Bayesian hierarchical models do not have subgroup classifications; thus, information is shared across all subgroups. When the difference between subgroups is large, it suggests that the subgroups belong to different clusters. In that case, placing all subgroups in one pool and borrowing information across all subgroups can result in substantial bias for the subgroups with strong borrowing, or a lack of efficiency gain with weak borrowing. To resolve this difficulty, we propose a hierarchical Bayesian classification and information sharing (BaCIS) model for the design of multigroup phase II clinical trials with binary outcomes. We introduce subgroup classification into the hierarchical model. Subgroups are classified into two clusters on the basis of their outcomes mimicking the hypothesis testing framework. Subsequently, information sharing takes place within subgroups in the same cluster, rather than across all subgroups. This method can be applied to the design and analysis of multigroup clinical trials with binary outcomes. Compared to the traditional hierarchical models, better operating characteristics are obtained with the BaCIS model under various scenarios.  相似文献   

4.
Debate exists over how to incorporate information from multipartite sequence data in phylogenetic analyses. Strict combined-data approaches argue for concatenation of all partitions and estimation of one evolutionary history, maximizing the explanatory power of the data. Consensus/independence approaches endorse a two-step procedure where partitions are analyzed independently and then a consensus is determined from the multiple results. Mixtures across the model space of a strict combined-data approach and a priori independent parameters are popular methods to integrate these methods. We propose an alternative middle ground by constructing a Bayesian hierarchical phylogenetic model. Our hierarchical framework enables researchers to pool information across data partitions to improve estimate precision in individual partitions while permitting estimation and testing of tendencies in across-partition quantities. Such across-partition quantities include the distribution from which individual topologies relating the sequences within a partition are drawn. We propose standard hierarchical priors on continuous evolutionary parameters across partitions, while the structure on topologies varies depending on the research problem. We illustrate our model with three examples. We first explore the evolutionary history of the guinea pig (Cavia porcellus) using alignments of 13 mitochondrial genes. The hierarchical model returns substantially more precise continuous parameter estimates than an independent parameter approach without losing the salient features of the data. Second, we analyze the frequency of horizontal gene transfer using 50 prokaryotic genes. We assume an unknown species-level topology and allow individual gene topologies to differ from this with a small estimable probability. Simultaneously inferring the species and individual gene topologies returns a transfer frequency of 17%. We also examine HIV sequences longitudinally sampled from HIV+ patients. We ask whether posttreatment development of CCR5 coreceptor virus represents concerted evolution from middisease CXCR4 virus or reemergence of initial infecting CCR5 virus. The hierarchical model pools partitions from multiple unrelated patients by assuming that the topology for each patient is drawn from a multinomial distribution with unknown probabilities. Preliminary results suggest evolution and not reemergence.  相似文献   

5.
We consider modeling jointly microarray RNA expression and DNA copy number data. We propose Bayesian mixture models that define latent Gaussian probit scores for the DNA and RNA, and integrate between the two platforms via a regression of the RNA probit scores on the DNA probit scores. Such a regression conveniently allows us to include additional sample specific covariates such as biological conditions and clinical outcomes. The two developed methods are aimed respectively to make inference on differential behaviour of genes in patients showing different subtypes of breast cancer and to predict the pathological complete response (pCR) of patients borrowing strength across the genomic platforms. Posterior inference is carried out via MCMC simulations. We demonstrate the proposed methodology using a published data set consisting of 121 breast cancer patients.  相似文献   

6.
Ubiquitination modification is closely related to cancer and participates in the regulation of tumor microenvironment. However, the role of ubiquitination modification in the immune response and prognosis of lung adenocarcinoma has not been elucidated. This study aims to establish a disease classification associated with ubiquitination and reveal the landscape of intratumor microbes in patients with lung adenocarcinoma for the first time. A total of 1314 patients with lung adenocarcinoma in the GEO and TCGA databases were included in our study. We constructed a ubiquitination scoring model using WGCNA and constructed ubiquitination subtypes using unsupervised clustering, analyzed the clinical characteristics, immune characteristics, and intratumor microbes characteristics, and screened out the relevant gene signatures, which were verified by RT-qPCR in human cancer cells. The results showed that the high ubiquitination subtype had poor prognosis, low degree of immune infiltration, high index of tumor stemness, and poor effect of immunotherapy. The subtypes with lower ubiquitination scores have better prognosis, higher tumor microenvironment score and better immunotherapy effect. The C2 subtype has high level of immune infiltration, lower intratumor microbes diversity and abundance, and good prognosis. The C3 subtype has low level of immune infiltration, higher intratumor microbes diversity and abundance, and poor prognosis. The C1 subtype has characteristics between C2 and C3. In summary, this paper constructs a scoring system and several subtypes based on ubiquitination genes, and analyzed the characteristics, which can help provide new methods for clinical treatment.  相似文献   

7.
Combination of several anticancer treatments has typically been presumed to have enhanced drug activity. Motivated by a real clinical trial, this paper considers phase I–II dose finding designs for dual-agent combinations, where one main objective is to characterize both the toxicity and efficacy profiles. We propose a two-stage Bayesian adaptive design that accommodates a change of patient population in-between. In stage I, we estimate a maximum tolerated dose combination using the escalation with overdose control (EWOC) principle. This is followed by a stage II, conducted in a new yet relevant patient population, to find the most efficacious dose combination. We implement a robust Bayesian hierarchical random-effects model to allow sharing of information on the efficacy across stages, assuming that the related parameters are either exchangeable or nonexchangeable. Under the assumption of exchangeability, a random-effects distribution is specified for the main effects parameters to capture uncertainty about the between-stage differences. The inclusion of nonexchangeability assumption further enables that the stage-specific efficacy parameters have their own priors. The proposed methodology is assessed with an extensive simulation study. Our results suggest a general improvement of the operating characteristics for the efficacy assessment, under a conservative assumption about the exchangeability of the parameters a priori.  相似文献   

8.
ABSTRACT: BACKGROUND: Breast carcinoma is known as a heterogeneous disease because gene expression analyses identify several subtypes and the molecular profiles are prognostic and predictive for patients. Our aim, in this study, is to estimate the prevalence of breast cancer subtypes and to determine the relationship between clinico-pathological characteristics, overall survival (OS) and disease free survival (DFS) for patients coming from north-east of Morocco. METHODS: We reviewed 366 cases of breast cancer diagnosed between January 2007 to June 2010 at the Department of pathology. Age, size tumor, metastatic profile, node involvement profile, OS and DFS were analyzed on 181 patients. These last parameters were estimated by Kaplan-Meier analysis and log-rank test to estimate outcome differences among subgroups. RESULTS: The average age was 45 years, our patients were diagnosed late (57% stage III, 17.5% stage IV) with a high average tumor size. Luminal A subtype was more prevalent (53.6%) associated with favorable clinic-pathological characteristics, followed by luminal B (16.4%), Her2-overexpressing (12.6%), basal-like (12.6%) and unclassified subtype (4.9%).Survival analysis showed a significant difference between subtypes. The triple negative tumors were associated with poor prognosis (49% OS, 39% DFS), whereas the luminal A were associated with a better prognosis (88% OS, 59% DFS). The luminal B and the Her2-overexpressing subtypes were associated with an intermediate prognosis (77% and 75% OS, and 41% and 38% DFS respectively). CONCLUSION: This study showed that molecular classification by immunohistochemistry was necessary for therapeutic decision and prognosis of breast carcinoma. The luminal A subtype was associated with favorable biological characteristics and a better prognosis than triple negative tumors that were associated with a poor prognosis and unfavorable clinic-pathological characteristics.  相似文献   

9.
Haplotype inference from phase-ambiguous multilocus genotype data is an important task for both disease-gene mapping and studies of human evolution. We report a novel haplotype-inference method based on a coalescence-guided hierarchical Bayes model. In this model, a hierarchical structure is imposed on the prior haplotype frequency distributions to capture the similarities among modern-day haplotypes attributable to their common ancestry. As a consequence, the model both allows distinct haplotypes to have different a priori probabilities according to the inferred hierarchical ancestral structure and results in a proper joint posterior distribution for all the parameters of interest. A Markov chain-Monte Carlo scheme is designed to draw from this posterior distribution. By using coalescence-based simulation and empirically generated data sets (Whitehead Institute's inflammatory bowel disease data sets and HapMap data sets), we demonstrate the merits of the new method in comparison with HAPLOTYPER and PHASE, with or without the presence of recombination hotspots and missing genotypes.  相似文献   

10.
《IRBM》2022,43(1):62-74
BackgroundThe prediction of breast cancer subtypes plays a key role in the diagnosis and prognosis of breast cancer. In recent years, deep learning (DL) has shown good performance in the intelligent prediction of breast cancer subtypes. However, most of the traditional DL models use single modality data, which can just extract a few features, so it cannot establish a stable relationship between patient characteristics and breast cancer subtypes.DatasetWe used the TCGA-BRCA dataset as a sample set for molecular subtype prediction of breast cancer. It is a public dataset that can be obtained through the following link: https://portal.gdc.cancer.gov/projects/TCGA-BRCAMethodsIn this paper, a Hybrid DL model based on the multimodal data is proposed. We combine the patient's gene modality data with image modality data to construct a multimodal fusion framework. According to the different forms and states, we set up feature extraction networks respectively, and then we fuse the output of the two feature networks based on the idea of weighted linear aggregation. Finally, the fused features are used to predict breast cancer subtypes. In particular, we use the principal component analysis to reduce the dimensionality of high-dimensional data of gene modality and filter the data of image modality. Besides, we also improve the traditional feature extraction network to make it show better performance.ResultsThe results show that compared with the traditional DL model, the Hybrid DL model proposed in this paper is more accurate and efficient in predicting breast cancer subtypes. Our model achieved a prediction accuracy of 88.07% in 10 times of 10-fold cross-validation. We did a separate AUC test for each subtype, and the average AUC value obtained was 0.9427. In terms of subtype prediction accuracy, our model is about 7.45% higher than the previous average.  相似文献   

11.
Ding M  Rosner GL  Müller P 《Biometrics》2008,64(3):886-894
Summary .   Most phase II screening designs available in the literature consider one treatment at a time. Each study is considered in isolation. We propose a more systematic decision-making approach to the phase II screening process. The sequential design allows for more efficiency and greater learning about treatments. The approach incorporates a Bayesian hierarchical model that allows combining information across several related studies in a formal way and improves estimation in small data sets by borrowing strength from other treatments. The design incorporates a utility function that includes sampling costs and possible future payoff. Computer simulations show that this method has high probability of discarding treatments with low success rates and moving treatments with high success rates to phase III trial.  相似文献   

12.
Ovarian cancer (OC) is associated with high mortality rate. However, the correlation between immune microenvironment and prognosis of OC remains unclear. This study aimed to explore prognostic significance of OC tumour microenvironment. The OC data set was selected from the cancer genome atlas (TCGA), and 307 samples were collected. Hierarchical clustering was performed according to the expression of 756 genes. The immune and matrix scores of all immune subtypes were determined, and Kruskal-Wallis test was used to analyse the differences in the immune and matrix scores between OC samples with different immune subtypes. The model for predicting prognosis was constructed based on the expression of immune-related genes. TIDE platform was applied to predict the effect of immunotherapy on patients with OC of different immune subtypes. The 307 OC samples were classified into three immune subtypes A-C. Patients in subtype B had poorer prognosis and lower survival rate. The infiltration of helper T cells and macrophages in microenvironment indicated significant differences between immune subtypes. Enrichment analyses of immune cell molecular pathways showed that JAK–STAT3 pathway changed significantly in subtype B. Furthermore, predictive response to immunotherapy in subtype B was significantly higher than that in subtype A and C. Immune subtyping can be used as an independent predictor of the prognosis of OC patients, which may be related to the infiltration patterns of immune cells in tumour microenvironment. In addition, patients in immune subtype B have superior response to immunotherapy, suggesting that patients in subtype B are suitable for immunotherapy.  相似文献   

13.
刘阳  王丽茹  张岩 《生物信息学》2021,19(4):240-248
为了通过分析DNA甲基化谱识别出与预后相关的结肠腺癌亚型。从TCGA数据库获取了结肠腺癌患者的甲基化数据,通过差异甲基化分析和构建COX比例风险回归模型筛得与预后显著相关的CpG位点,并通过一致性聚类识别出7个亚型。生存分析和临床特征检验显示7个亚型间预后差异显著且亚型特征可由多种临床特征反映。此外,用7个亚型间识别出的差异甲基化位点构建的基于SMO(序列最小最优化)的预测模型在各亚型上都有较高的AUC值,并用检验集进行了验证。综上,本研究利用生物信息学算法识别了7个预后差异的结肠腺癌亚型并挖掘了它们的特异性甲基化标记。该研究结果或可使得结肠腺癌预后被更精准地评估,为早期诊断及治疗方案提供新思路。  相似文献   

14.
Targeted therapies on the basis of genomic aberrations analysis of the tumor have shown promising results in cancer prognosis and treatment. Regardless of tumor type, trials that match patients to targeted therapies for their particular genomic aberrations have become a mainstream direction of therapeutic management of patients with cancer. Therefore, finding the subpopulation of patients who can most benefit from an aberration‐specific targeted therapy across multiple cancer types is important. We propose an adaptive Bayesian clinical trial design for patient allocation and subpopulation identification. We start with a decision theoretic approach, including a utility function and a probability model across all possible subpopulation models. The main features of the proposed design and population finding methods are the use of a flexible nonparametric Bayesian survival regression based on a random covariate‐dependent partition of patients, and decisions based on a flexible utility function that reflects the requirement of the clinicians appropriately and realistically, and the adaptive allocation of patients to their superior treatments. Through extensive simulation studies, the new method is demonstrated to achieve desirable operating characteristics and compares favorably against the alternatives.  相似文献   

15.
Passive surveillance systems are widely used to monitor diseases occurrence over wide spatial areas due to their cost-effectiveness and integration into broadly distributed healthcare systems. However, such systems are generally associated with imperfect ascertainment of disease cases and with heterogeneous capture probabilities arising from factors such as differential access to care. Augmenting passive surveillance systems with other surveillance efforts provides a way to estimate the true number of incident cases. We develop a hierarchical modeling framework for analyzing data from multiple surveillance systems that allows for individual-level covariate-dependent heterogeneous capture probabilities, and borrows information across surveillance sites to improve estimation of the true number of incident cases. Inference is carried out via a two-stage Bayesian procedure. Simulation studies illustrated superior performance of the proposed approach with respect to bias, root mean square error, and coverage compared to a model that does not borrow information across sites. We applied the proposed model to data from three surveillance systems reporting pulmonary tuberculosis (PTB) cases in a major center of ongoing transmission in China. The analysis yielded bias-corrected estimates of PTB cases from the passive system and led to the identification of risk factors associated with PTB rates, as well as factors influencing the operating characteristics of the implemented surveillance systems.  相似文献   

16.
17.
A fundamental challenge to understanding patterns in ecological systems lies in employing methods that can analyse, test and draw inference from measured associations between variables across scales. Hierarchical linear models (HLM) use advanced estimation algorithms to measure regression relationships and variance–covariance parameters in hierarchically structured data. Although hierarchical models have occasionally been used in the analysis of ecological data, their full potential to describe scales of association, diagnose variance explained, and to partition uncertainty has not been employed. In this paper we argue that the use of the HLM framework can enable significantly improved inference about ecological processes across levels of organization. After briefly describing the principals behind HLM, we give two examples that demonstrate a protocol for building hierarchical models and answering questions about the relationships between variables at multiple scales. The first example employs maximum likelihood methods to construct a two-level linear model predicting herbivore damage to a perennial plant at the individual- and patch-scale; the second example uses Bayesian estimation techniques to develop a three-level logistic model of plant flowering probability across individual plants, microsites and populations. HLM model development and diagnostics illustrate the importance of incorporating scale when modelling associations in ecological systems and offer a sophisticated yet accessible method for studies of populations, communities and ecosystems. We suggest that a greater coupling of hierarchical study designs and hierarchical analysis will yield significant insights on how ecological processes operate across scales.  相似文献   

18.
Finding subtypes of heterogeneous diseases is the biggest challenge in the area of biology. Often, clustering is used to provide a hypothesis for the subtypes of a heterogeneous disease. However, there are usually discrepancies between the clusterings produced by different algorithms. This work introduces a simple method which provides the most consistent clusters across three different clustering algorithms for a melanoma and a breast cancer data set. The method is validated by showing that the Silhouette, Dunne's and Davies-Bouldin's cluster validation indices are better for the proposed algorithm than those obtained by k-means and another consensus clustering algorithm. The hypotheses of the consensus clusters on both the data sets are corroborated by clear genetic markers and 100 percent classification accuracy. In Bittner et al.'s melanoma data set, a previously hypothesized primary cluster is recognized as the largest consensus cluster and a new partition of this cluster into two subclusters is proposed. In van't Veer et al.'s breast cancer data set, previously proposed "basal” and "luminal A” subtypes are clearly recognized as the two predominant clusters. Furthermore, a new hypothesis is provided about the existence of two subgroups within the "basal” subtype in this data set. The clusters of van't Veer's data set is also validated by high classification accuracy obtained in the data set of van de Vijver et al.  相似文献   

19.
20.
Modern high-throughput biotechnologies such as microarray and next-generation sequencing produce a massive amount of information for each sample assayed. However, in a typical high-throughput experiment, only limited amount of data are observed for each individual feature, thus the classical “large p, small n” problem. Bayesian hierarchical model, capable of borrowing strength across features within the same dataset, has been recognized as an effective tool in analyzing such data. However, the shrinkage effect, the most prominent feature of hierarchical features, can lead to undesirable over-correction for some features. In this work, we discuss possible causes of the over-correction problem and propose several alternative solutions. Our strategy is rooted in the fact that in the Big Data era, large amount of historical data are available which should be taken advantage of. Our strategy presents a new framework to enhance the Bayesian hierarchical model. Through simulation and real data analysis, we demonstrated superior performance of the proposed strategy. Our new strategy also enables borrowing information across different platforms which could be extremely useful with emergence of new technologies and accumulation of data from different platforms in the Big Data era. Our method has been implemented in R package “adaptiveHM,” which is freely available from https://github.com/benliemory/adaptiveHM.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号