首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
    
Clinical treatment outcomes are the quality and cost targets that health-care providers aim to improve. Most existing outcome analysis focuses on a single disease or all diseases combined. Motivated by the success of molecular and phenotypic human disease networks (HDNs), this article develops a clinical treatment network that describes the interconnections among diseases in terms of inpatient length of stay (LOS) and readmission. Here one node represents one disease, and two nodes are linked with an edge if their LOS and number of readmissions are conditionally dependent. This is the very first HDN that jointly analyzes multiple clinical treatment outcomes at the pan-disease level. To accommodate the unique data characteristics, we propose a modeling approach based on two-part generalized linear models and estimation based on penalized integrative analysis. Analysis is conducted on the Medicare inpatient data of 100,000 randomly selected subjects for the period of January 2010 to December 2018. The resulted network has 1008 edges for 106 nodes. We analyze key network properties including connectivity, module/hub, and temporal variation. The findings are biomedically sensible. For example, high connectivity and hub conditions, such as disorders of lipid metabolism and essential hypertension, are identified. There are also findings that are less/not investigated in the literature. Overall, this study can provide additional insight into diseases' properties and their interconnections and assist more efficient disease management and health-care resources allocation.  相似文献   

2.
    
In functional data analysis for longitudinal data, the observation process is typically assumed to be noninformative, which is often violated in real applications. Thus, methods that fail to account for the dependence between observation times and longitudinal outcomes may result in biased estimation. For longitudinal data with informative observation times, we find that under a general class of shared random effect models, a commonly used functional data method may lead to inconsistent model estimation while another functional data method results in consistent and even rate-optimal estimation. Indeed, we show that the mean function can be estimated appropriately via penalized splines and that the covariance function can be estimated appropriately via penalized tensor-product splines, both with specific choices of parameters. For the proposed method, theoretical results are provided, and simulation studies and a real data analysis are conducted to demonstrate its performance.  相似文献   

3.
    
Huihang Liu  Xinyu Zhang 《Biometrics》2023,79(3):2050-2062
Advances in information technologies have made network data increasingly frequent in a spectrum of big data applications, which is often explored by probabilistic graphical models. To precisely estimate the precision matrix, we propose an optimal model averaging estimator for Gaussian graphs. We prove that the proposed estimator is asymptotically optimal when candidate models are misspecified. The consistency and the asymptotic distribution of model averaging estimator, and the weight convergence are also studied when at least one correct model is included in the candidate set. Furthermore, numerical simulations and a real data analysis on yeast genetic data are conducted to illustrate that the proposed method is promising.  相似文献   

4.
  总被引:1,自引:0,他引:1  
  相似文献   

5.
基因芯片数据分析与处理   总被引:7,自引:1,他引:6       下载免费PDF全文
基因芯片技术在基因表达分析等应用过程中产生大量的数据,如何处理和分析这些数据并从中提取出有价值的生物学信息是一个极为重要的问题.其过程包括原始数据的获取及处理、标准化数据的统计学分析、以及数据的存储和交流等.  相似文献   

6.
    
F Fogolari  S Tessari  H Molinari 《Proteins》2002,46(2):161-170
One of the standard tools for the analysis of data arranged in matrix form is singular value decomposition (SVD). Few applications to genomic data have been reported to date mainly for the analysis of gene expression microarray data. We review SVD properties, examine mathematical terms and assumptions implicit in the SVD formalism, and show that SVD can be applied to the analysis of matrices representing pairwise alignment scores between large sets of protein sequences. In particular, we illustrate SVD capabilities for data dimension reduction and for clustering protein sequences. A comparison is performed between SVD-generated clusters of proteins and annotation reported in the SWISS-PROT Database for a set of protein sequences forming the calycin superfamily, entailing all entries corresponding to the lipocalin, cytosolic fatty acid-binding protein, and avidin-streptavidin Prosite patterns.  相似文献   

7.
8.
    
Biomarkers are often organized into networks, in which the strengths of network connections vary across subjects depending on subject-specific covariates (eg, genetic variants). Variation of network connections, as subject-specific feature variables, has been found to predict disease clinical outcome. In this work, we develop a two-stage method to estimate biomarker networks that account for heterogeneity among subjects and evaluate network's association with disease clinical outcome. In the first stage, we propose a conditional Gaussian graphical model with mean and precision matrix depending on covariates to obtain covariate-dependent networks with connection strengths varying across subjects while assuming homogeneous network structure. In the second stage, we evaluate clinical utility of network measures (connection strengths) estimated from the first stage. The second-stage analysis provides the relative predictive power of between-region network measures on clinical impairment in the context of regional biomarkers and existing disease risk factors. We assess the performance of proposed method by extensive simulation studies and application to a Huntington's disease (HD) study to investigate the effect of HD causal gene on the rate of change in motor symptom through affecting brain subcortical and cortical gray matter atrophy connections. We show that cortical network connections and subcortical volumes, but not subcortical connections are identified to be predictive of clinical motor function deterioration. We validate these findings in an independent HD study. Lastly, highly similar patterns seen in the gray matter connections and a previous white matter connectivity study suggest a shared biological mechanism for HD and support the hypothesis that white matter loss is a direct result of neuronal loss as opposed to the loss of myelin or dysmyelination.  相似文献   

9.
Gene function annotation remains a key challenge in modern biology. This is especially true for high-throughput techniques such as gene expression experiments. Vital information about genes is available electronically from biomedical literature in the form of full texts and abstracts. In addition, various publicly available databases (such as GenBank, Gene Ontology and Entrez) provide access to gene-related information at different levels of biological organization, granularity and data format. This information is being used to assess and interpret the results from high-throughput experiments. To improve keyword extraction for annotational clustering and other types of analyses, we have developed a novel text mining approach, which is based on keywords identified at the level of gene annotation sentences (in particular sentences characterizing biological function) instead of entire abstracts. Further, to improve the expressiveness and usefulness of gene annotation terms, we investigated the combination of sentence-level keywords with terms from the Medical Subject Headings (MeSH) and Gene Ontology (GO) resources. We find that sentence-level keywords combined with MeSH terms outperforms the typical 'baseline' set-up (term frequencies at the level of abstracts) by a significant margin, whereas the addition of GO terms improves matters only marginally. We validated our approach on the basis of a manually annotated corpus of 200 abstracts generated on the basis of 2 cancer categories and 10 genes per category. We applied the method in the context of three sets of differentially expressed genes obtained from pediatric brain tumor samples. This analysis suggests novel interpretations of discovered gene expression patterns.  相似文献   

10.
    
In this paper, we propose a functional partially linear regression model with latent group structures to accommodate the heterogeneous relationship between a scalar response and functional covariates. The proposed model is motivated by a salinity tolerance study of barley families, whose main objective is to detect salinity tolerant barley plants. Our model is flexible, allowing for heterogeneous functional coefficients while being efficient by pooling information within a group for estimation. We develop an algorithm in the spirit of the K-means clustering to identify latent groups of the subjects under study. We establish the consistency of the proposed estimator, derive the convergence rate and the asymptotic distribution, and develop inference procedures. We show by simulation studies that the proposed method has higher accuracy for recovering latent groups and for estimating the functional coefficients than existing methods. The analysis of the barley data shows that the proposed method can help identify groups of barley families with different salinity tolerant abilities.  相似文献   

11.
    
In this paper, we consider the problem of nonparametric curve fitting in the specific context of censored data. We propose an extension of the penalized splines approach using Kaplan–Meier weights to take into account the effect of censorship and generalized cross‐validation techniques to choose the smoothing parameter adapted to the case of censored samples. Using various simulation studies, we analyze the effectiveness of the censored penalized splines method proposed and show that the performance is quite satisfactory. We have extended this proposal to a generalized additive models (GAM) framework introducing a correction of the censorship effect, thus enabling more complex models to be estimated immediately. A real dataset from Stanford Heart Transplant data is also used to illustrate the methodology proposed, which is shown to be a good alternative when the probability distribution for the response variable and the functional form are not known in censored regression models.  相似文献   

12.
    
The ecological theory of the existence of multiple stable states between species, or the spatial heterogeneity of some unobserved environmental factor, supports the idea of multitype interactions between species. These multitype interactions can lead to different assemblages of species abundances. An exploratory tool for the detection of these species assemblages and for their spatial analysis is presented in this article. A two‐stage analysis is proposed. First, a classification into types of species assemblages using only the species abundances at each site, regardless of their spatial location, is performed. The clustering procedure is based on multivariate normal mixtures and provides a measure of the classification uncertainty. Second, some tools for the study of the spatial structure of these types of assemblages are presented. We transfer the classification uncertainty to the spatial analysis of the classes in order to draw more accurate conclusions. This classification and spatial analysis method is used to point out a spatial gradient of infection in a host–pathogen system in the Åland Islands in Finland. It can be a useful preliminary tool for ecological studies involving the spatial distributions of several species.  相似文献   

13.
建立了15种中国独荇菜属植物和秘鲁原产的玛咖的性状矩阵,并采用SPSSV11.0软件建立了树谱图,以此检验《中国植物志》有关该属植物形态学分类的正确性,同时确立玛咖与该属其他植物之间的相互亲缘关系,为研究玛咖的育种、栽培提供理论依据。  相似文献   

14.
15.
    
The Generalised Estimating Equations (GEE) proposed by Liang and Zeger (1986) and Zeger and Liang (1986) have found considerable attention in the last decade (for an overview see e.g. Ziegler, and Blettner , 1998). Several self-made programs for solving the GEE are available. This paper presents a comparison of three GEE procedures that are already available in SAS PROC GENMOD, STATA procedure XTGEE and SUDAAN PROC MULTILOG. We show that the estimation results may be quite distinct due to different implementations. Summing up, it is pleasant that GEE is becoming established in commercial software packages. However, some aspects of the implementations should be improved.  相似文献   

16.
    
Serban N  Jiang H 《Biometrics》2012,68(3):805-814
Summary In this article, we investigate clustering methods for multilevel functional data, which consist of repeated random functions observed for a large number of units (e.g., genes) at multiple subunits (e.g., bacteria types). To describe the within- and between variability induced by the hierarchical structure in the data, we take a multilevel functional principal component analysis (MFPCA) approach. We develop and compare a hard clustering method applied to the scores derived from the MFPCA and a soft clustering method using an MFPCA decomposition. In a simulation study, we assess the estimation accuracy of the clustering membership and the cluster patterns under a series of settings: small versus moderate number of time points; various noise levels; and varying number of subunits per unit. We demonstrate the applicability of the clustering analysis to a real data set consisting of expression profiles from genes activated by immunity system cells. Prevalent response patterns are identified by clustering the expression profiles using our multilevel clustering analysis.  相似文献   

17.
It has been a challenging task to integrate high-throughput data into investigations of the systematic and dynamic organization of biological networks. Here, we presented a simple hierarchical clustering algorithm that goes a long way to achieve this aim. Our method effectively reveals the modular structure of the yeast protein-protein interaction network and distinguishes protein complexes from functional modules by integrating high-throughput protein-protein interaction data with the added subcellular localization and expression profile data. Furthermore, we take advantage of the detected modules to provide a reliably functional context for the uncharacterized components within modules. On the other hand, the integration of various protein-protein association information makes our method robust to false-positives, especially for derived protein complexes. More importantly, this simple method can be extended naturally to other types of data fusion and provides a framework for the study of more comprehensive properties of the biological network and other forms of complex networks.  相似文献   

18.
A modification of the graphical Costello method is proposed for the analysis of stomach contents data. The new method allows prey importance, feeding strategy and the interand intra-individual components of niche width to be explored using graphical presentation. The analysis is based on a two-dimensional representation of prey-specific abundance and frequency of occurrence of the different prey types in the diet. The paper describes the new method and the parameters therein, and also present some examples of the utilization of the method. The method may be particularly well-suited for the examination of predictions made from optimal foraging, competition and niche theories.  相似文献   

19.
Bayesian model‐based clustering programs have gained increased popularity in studies of population structure since the publication of the software structure . These programs are generally acknowledged as performing well, but their running‐time may be prohibitive. fastruct is a non‐Bayesian implementation of the classical model with no‐admixture uncorrelated allele frequencies. This new program relies on the expectation–maximization principle, and produces assignment rivalling other model‐based clustering programs. In addition, it can be manyfold faster than Bayesian implementations. The software consists of a command‐line engine, which is suitable for batch analysis of data, and a graphical interface, which is convenient for exploring data.  相似文献   

20.
Analyses of living and fossil taxa are crucial for understanding biodiversity through time. The total evidence method allows living and fossil taxa to be combined in phylogenies, using molecular data for living taxa and morphological data for living and fossil taxa. With this method, substantial overlap of coded anatomical characters among living and fossil taxa is vital for accurately inferring topology. However, although molecular data for living species are widely available, scientists generating morphological data mainly focus on fossils. Therefore, there are fewer coded anatomical characters in living taxa, even in well-studied groups such as mammals. We investigated the number of coded anatomical characters available in phylogenetic matrices for living mammals and how these were phylogenetically distributed across orders. Eleven of 28 mammalian orders have less than 25% species with available characters; this has implications for the accurate placement of fossils, although the issue is less pronounced at higher taxonomic levels. In most orders, species with available characters are randomly distributed across the phylogeny, which may reduce the impact of the problem. We suggest that increased morphological data collection efforts for living taxa are needed to produce accurate total evidence phylogenies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号