首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 296 毫秒
1.
The obesity epidemic represents an important public health issue in the United States. Studying obesity trends across age groups over time helps to identify crucial relationships between the disease and medical treatment allowing for the development of effective prevention policies. We aim to define subgroups of age and cohort effects in obesity prevalence over time by considering an optimization approach applied to the age‐period‐cohort (APC) model. We consider a heterogeneous regression problem where the regression coefficients are age dependent and belong to subgroups with unknown grouping information. Using the APC model, we apply the alternating direction method of multipliers (ADMM) algorithm to develop a two‐step algorithm for (1) subgrouping of cohort effects based on similar characteristics and (2) subgrouping age effects over time. The proposed clustering approach is illustrated for the United States population, aged 18–79, during the period 1990–2017.  相似文献   

2.
In this paper, we present a method that enable both homology-based approach and composition-based approach to further study the functional core (i.e., microbial core and gene core, correspondingly). In the proposed method, the identification of major functionality groups is achieved by generative topic modeling, which is able to extract useful information from unlabeled data. We first show that generative topic model can be used to model the taxon abundance information obtained by homology-based approach and study the microbial core. The model considers each sample as a “document,” which has a mixture of functional groups, while each functional group (also known as a “latent topic”) is a weight mixture of species. Therefore, estimating the generative topic model for taxon abundance data will uncover the distribution over latent functions (latent topic) in each sample. Second, we show that, generative topic model can also be used to study the genome-level composition of “N-mer” features (DNA subreads obtained by composition-based approaches). The model consider each genome as a mixture of latten genetic patterns (latent topics), while each functional pattern is a weighted mixture of the “N-mer” features, thus the existence of core genomes can be indicated by a set of common N-mer features. After studying the mutual information between latent topics and gene regions, we provide an explanation of the functional roles of uncovered latten genetic patterns. The experimental results demonstrate the effectiveness of proposed method.  相似文献   

3.
Summary In studies involving functional data, it is commonly of interest to model the impact of predictors on the distribution of the curves, allowing flexible effects on not only the mean curve but also the distribution about the mean. Characterizing the curve for each subject as a linear combination of a high‐dimensional set of potential basis functions, we place a sparse latent factor regression model on the basis coefficients. We induce basis selection by choosing a shrinkage prior that allows many of the loadings to be close to zero. The number of latent factors is treated as unknown through a highly‐efficient, adaptive‐blocked Gibbs sampler. Predictors are included on the latent variables level, while allowing different predictors to impact different latent factors. This model induces a framework for functional response regression in which the distribution of the curves is allowed to change flexibly with predictors. The performance is assessed through simulation studies and the methods are applied to data on blood pressure trajectories during pregnancy.  相似文献   

4.
Salinity is a major limiting factor for agricultural production worldwide. A better understanding of the mechanisms of salinity stress response will aid efforts to improve plant salt tolerance. In this study, a combination of small RNA and mRNA degradome sequencing was used to identify salinity responsive-miRNAs and their targets in barley. A total of 152 miRNAs belonging to 126 families were identified, of which 44 were found to be salinity responsive with 30 up-regulated and 25 down-regulated respectively. The majority of the salinity-responsive miRNAs were up-regulated at the 8h time point, while down-regulated at the 3h and 27h time points. The targets of these miRNAs were further detected by degradome sequencing coupled with bioinformatics prediction. Finally, qRT-PCR was used to validate the identified miRNA and their targets. Our study systematically investigated the expression profile of miRNA and their targets in barley during salinity stress phase, which can contribute to understanding how miRNAs respond to salinity stress in barley and other cereal crops.  相似文献   

5.
Chen H  Wang Y 《Biometrics》2011,67(3):861-870
In this article, we propose penalized spline (P-spline)-based methods for functional mixed effects models with varying coefficients. We decompose longitudinal outcomes as a sum of several terms: a population mean function, covariates with time-varying coefficients, functional subject-specific random effects, and residual measurement error processes. Using P-splines, we propose nonparametric estimation of the population mean function, varying coefficient, random subject-specific curves, and the associated covariance function that represents between-subject variation and the variance function of the residual measurement errors which represents within-subject variation. Proposed methods offer flexible estimation of both the population- and subject-level curves. In addition, decomposing variability of the outcomes as a between- and within-subject source is useful in identifying the dominant variance component therefore optimally model a covariance function. We use a likelihood-based method to select multiple smoothing parameters. Furthermore, we study the asymptotics of the baseline P-spline estimator with longitudinal data. We conduct simulation studies to investigate performance of the proposed methods. The benefit of the between- and within-subject covariance decomposition is illustrated through an analysis of Berkeley growth data, where we identified clearly distinct patterns of the between- and within-subject covariance functions of children's heights. We also apply the proposed methods to estimate the effect of antihypertensive treatment from the Framingham Heart Study data.  相似文献   

6.
We develop a new method for variable selection in a nonlinear additive function-on-scalar regression (FOSR) model. Existing methods for variable selection in FOSR have focused on the linear effects of scalar predictors, which can be a restrictive assumption in the presence of multiple continuously measured covariates. We propose a computationally efficient approach for variable selection in existing linear FOSR using functional principal component scores of the functional response and extend this framework to a nonlinear additive function-on-scalar model. The proposed method provides a unified and flexible framework for variable selection in FOSR, allowing nonlinear effects of the covariates. Numerical analysis using simulation study illustrates the advantages of the proposed method over existing variable selection methods in FOSR even when the underlying covariate effects are all linear. The proposed procedure is demonstrated on accelerometer data from the 2003–2004 cohorts of the National Health and Nutrition Examination Survey (NHANES) in understanding the association between diurnal patterns of physical activity and demographic, lifestyle, and health characteristics of the participants.  相似文献   

7.
MOTIVATION: Characterization of a protein family by its distinct sequence domains is crucial for functional annotation and correct classification of newly discovered proteins. Conventional Multiple Sequence Alignment (MSA) based methods find difficulties when faced with heterogeneous groups of proteins. However, even many families of proteins that do share a common domain contain instances of several other domains, without any common underlying linear ordering. Ignoring this modularity may lead to poor or even false classification results. An automated method that can analyze a group of proteins into the sequence domains it contains is therefore highly desirable. RESULTS: We apply a novel method to the problem of protein domain detection. The method takes as input an unaligned group of protein sequences. It segments them and clusters the segments into groups sharing the same underlying statistics. A Variable Memory Markov (VMM) model is built using a Prediction Suffix Tree (PST) data structure for each group of segments. Refinement is achieved by letting the PSTs compete over the segments, and a deterministic annealing framework infers the number of underlying PST models while avoiding many inferior solutions. We show that regions of similar statistics correlate well with protein sequence domains, by matching a unique signature to each domain. This is done in a fully automated manner, and does not require or attempt an MSA. Several representative cases are analyzed. We identify a protein fusion event, refine an HMM superfamily classification into the underlying families the HMM cannot separate, and detect all 12 instances of a short domain in a group of 396 sequences. CONTACT: jill@cs.huji.ac.il; tishby@cs.huji.ac.il.  相似文献   

8.
Random effects selection in linear mixed models   总被引:2,自引:0,他引:2  
Chen Z  Dunson DB 《Biometrics》2003,59(4):762-769
We address the important practical problem of how to select the random effects component in a linear mixed model. A hierarchical Bayesian model is used to identify any random effect with zero variance. The proposed approach reparameterizes the mixed model so that functions of the covariance parameters of the random effects distribution are incorporated as regression coefficients on standard normal latent variables. We allow random effects to effectively drop out of the model by choosing mixture priors with point mass at zero for the random effects variances. Due to the reparameterization, the model enjoys a conditionally linear structure that facilitates the use of normal conjugate priors. We demonstrate that posterior computation can proceed via a simple and efficient Markov chain Monte Carlo algorithm. The methods are illustrated using simulated data and real data from a study relating prenatal exposure to polychlorinated biphenyls and psychomotor development of children.  相似文献   

9.
The generalized binomial distribution is defined as the distribution of a sum of symmetrically distributed Bernoulli random variates. Several two-parameter families of generalized binomial distributions have received attention in the literature, including the Polya urn model, the correlated binomial model and the latent variable model. Some properties and limitations of the three distributions are described. An algorithm for maximum likelihood estimation for two-parameter generalized binomial distributions is proposed. The Polya urn model and the latent variable model were found to provide good fits to sub-binomial data given by Parkes. An extension of the latent variable model to incorporate heterogeneous response probabilities is discussed.  相似文献   

10.
We explore a hierarchical generalized latent factor model for discrete and bounded response variables and in particular, binomial responses. Specifically, we develop a novel two-step estimation procedure and the corresponding statistical inference that is computationally efficient and scalable for the high dimension in terms of both the number of subjects and the number of features per subject. We also establish the validity of the estimation procedure, particularly the asymptotic properties of the estimated effect size and the latent structure, as well as the estimated number of latent factors. The results are corroborated by a simulation study and for illustration, the proposed methodology is applied to analyze a dataset in a gene–environment association study.  相似文献   

11.
We propose a model for high dimensional mediation analysis that includes latent variables. We describe our model in the context of an epidemiologic study for incident breast cancer with one exposure and a large number of biomarkers (i.e., potential mediators). We assume that the exposure directly influences a group of latent, or unmeasured, factors which are associated with both the outcome and a subset of the biomarkers. The biomarkers associated with the latent factors linking the exposure to the outcome are considered “mediators.” We derive the likelihood for this model and develop an expectation‐maximization algorithm to maximize an L1‐penalized version of this likelihood to limit the number of factors and associated biomarkers. We show that the resulting estimates are consistent and that the estimates of the nonzero parameters have an asymptotically normal distribution. In simulations, procedures based on this new model can have significantly higher power for detecting the mediating biomarkers compared with the simpler approaches. We apply our method to a study that evaluates the relationship between body mass index, 481 metabolic measurements, and estrogen‐receptor positive breast cancer.  相似文献   

12.
We propose a method for estimating the size of a population in a multiple record system in the presence of missing data. The method is based on a latent class model where the parameters and the latent structure are estimated using a Gibbs sampler. The proposed approach is illustrated through the analysis of a data set already known in the literature, which consists of five registrations of neural tube defects.  相似文献   

13.
Latent class analysis is an intuitive tool to characterize disease phenotype heterogeneity. With data more frequently collected on multiple phenotypes in chronic disease studies, it is of rising interest to investigate how the latent classes embedded in one phenotype are related to another phenotype. Motivated by a cohort with mild cognitive impairment (MCI) from the Uniform Data Set (UDS), we propose and study a time-dependent structural model to evaluate the association between latent classes and competing risk outcomes that are subject to missing failure types. We develop a two-step estimation procedure which circumvents latent class membership assignment and is rigorously justified in terms of accounting for the uncertainty in classifying latent classes. The new method also properly addresses the realistic complications for competing risks outcomes, including random censoring and missing failure types. The asymptotic properties of the resulting estimator are established. Given that the standard bootstrapping inference is not feasible in the current problem setting, we develop analytical inference procedures, which are easy to implement. Our simulation studies demonstrate the advantages of the proposed method over benchmark approaches. We present an application to the MCI data from UDS, which uncovers a detailed picture of the neuropathological relevance of the baseline MCI subgroups.  相似文献   

14.
Modeling functional data with spatially heterogeneous shape characteristics   总被引:1,自引:0,他引:1  
We propose a novel class of models for functional data exhibiting skewness or other shape characteristics that vary with spatial or temporal location. We use copulas so that the marginal distributions and the dependence structure can be modeled independently. Dependence is modeled with a Gaussian or t-copula, so that there is an underlying latent Gaussian process. We model the marginal distributions using the skew t family. The mean, variance, and shape parameters are modeled nonparametrically as functions of location. A computationally tractable inferential framework for estimating heterogeneous asymmetric or heavy-tailed marginal distributions is introduced. This framework provides a new set of tools for increasingly complex data collected in medical and public health studies. Our methods were motivated by and are illustrated with a state-of-the-art study of neuronal tracts in multiple sclerosis patients and healthy controls. Using the tools we have developed, we were able to find those locations along the tract most affected by the disease. However, our methods are general and highly relevant to many functional data sets. In addition to the application to one-dimensional tract profiles illustrated here, higher-dimensional extensions of the methodology could have direct applications to other biological data including functional and structural magnetic resonance imaging (MRI).  相似文献   

15.
Generalized estimating equation (GEE) is widely adopted for regression modeling for longitudinal data, taking account of potential correlations within the same subjects. Although the standard GEE assumes common regression coefficients among all the subjects, such an assumption may not be realistic when there is potential heterogeneity in regression coefficients among subjects. In this paper, we develop a flexible and interpretable approach, called grouped GEE analysis, to modeling longitudinal data with allowing heterogeneity in regression coefficients. The proposed method assumes that the subjects are divided into a finite number of groups and subjects within the same group share the same regression coefficient. We provide a simple algorithm for grouping subjects and estimating the regression coefficients simultaneously, and show the asymptotic properties of the proposed estimator. The number of groups can be determined by the cross validation with averaging method. We demonstrate the proposed method through simulation studies and an application to a real data set.  相似文献   

16.
León LF  Tsai CL 《Biometrics》2004,60(1):75-84
We propose a new type of residual and an easily computed functional form test for the Cox proportional hazards model. The proposed test is a modification of the omnibus test for testing the overall fit of a parametric regression model, developed by Stute, González Manteiga, and Presedo Quindimil (1998, Journal of the American Statistical Association93, 141-149), and is based on what we call censoring consistent residuals. In addition, we develop residual plots that can be used to identify the correct functional forms of covariates. We compare our test with the functional form test of Lin, Wei, and Ying (1993, Biometrika80, 557-572) in a simulation study. The practical application of the proposed residuals and functional form test is illustrated using both a simulated data set and a real data set.  相似文献   

17.
本文采用反相高效液相色谱(reversedphase high-performance liquid chromatography,RP-HPLC)技术分析了中国种植的24个不同大麦品种的种子醇溶贮藏蛋白质。首先,根据所获得的色谱图的相似性可以将供试品种分成10组,每组各有自己的共同特征色谱峰;其次,再依据各组内不同品种间色谱图的定性或定量上的差异可以将它们分别区别和鉴定;这表明大麦种子醇溶贮藏蛋白质的异质性较强,其组成随基因型的不同而有所变异。因此,应用RP-HPLC技术分析大麦种子醇溶贮藏蛋白质可以准确、快速地对大麦品种进行鉴定。  相似文献   

18.
Classification of species into different functional groups based on biological criteria has been a difficult problem in ecology. The difficulty mainly arises because natural classification patterns are not necessarily mutually exclusive. The more group characteristics overlap, the more difficult it is to identify the membership of a species in the overlapping portions of any two groups. In this paper, we present an application of discriminant analysis by creating classification models from life history and morphological data for two specialist and two generalist life-styles type of predaceous phytoseiid mites. Two stages can be distinguished in our method: life-style group membership assignment and trait variable evaluation. We use a Bayesian framework to create a classifier system to locate or assign species within a mixture of trait distributions. The method assumes that a mixture of trait distributions can represent the multiple dimensions of biological data. The mixture is most evident near the boundaries between groups. Because of the complexity of analytical solution, an iterative method is used to estimate the unknown means, variances, and mixing proportion between groups. We also developed a criterion based on information theory to evaluate model performance with different combinations of input variables and different hypotheses. We present a working example of our proposed methods. We apply these methods to the problem of selecting key species for inoculative release and for classical introductions of biological pest control agents.  相似文献   

19.
Nonparametric mixed effects models for unequally sampled noisy curves   总被引:7,自引:0,他引:7  
Rice JA  Wu CO 《Biometrics》2001,57(1):253-259
We propose a method of analyzing collections of related curves in which the individual curves are modeled as spline functions with random coefficients. The method is applicable when the individual curves are sampled at variable and irregularly spaced points. This produces a low-rank, low-frequency approximation to the covariance structure, which can be estimated naturally by the EM algorithm. Smooth curves for individual trajectories are constructed as best linear unbiased predictor (BLUP) estimates, combining data from that individual and the entire collection. This framework leads naturally to methods for examining the effects of covariates on the shapes of the curves. We use model selection techniques--Akaike information criterion (AIC), Bayesian information criterion (BIC), and cross-validation--to select the number of breakpoints for the spline approximation. We believe that the methodology we propose provides a simple, flexible, and computationally efficient means of functional data analysis.  相似文献   

20.
Dimension reduction of high‐dimensional microbiome data facilitates subsequent analysis such as regression and clustering. Most existing reduction methods cannot fully accommodate the special features of the data such as count‐valued and excessive zero reads. We propose a zero‐inflated Poisson factor analysis model in this paper. The model assumes that microbiome read counts follow zero‐inflated Poisson distributions with library size as offset and Poisson rates negatively related to the inflated zero occurrences. The latent parameters of the model form a low‐rank matrix consisting of interpretable loadings and low‐dimensional scores that can be used for further analyses. We develop an efficient and robust expectation‐maximization algorithm for parameter estimation. We demonstrate the efficacy of the proposed method using comprehensive simulation studies. The application to the Oral Infections, Glucose Intolerance, and Insulin Resistance Study provides valuable insights into the relation between subgingival microbiome and periodontal disease.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号