首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Serban N  Jiang H 《Biometrics》2012,68(3):805-814
Summary In this article, we investigate clustering methods for multilevel functional data, which consist of repeated random functions observed for a large number of units (e.g., genes) at multiple subunits (e.g., bacteria types). To describe the within- and between variability induced by the hierarchical structure in the data, we take a multilevel functional principal component analysis (MFPCA) approach. We develop and compare a hard clustering method applied to the scores derived from the MFPCA and a soft clustering method using an MFPCA decomposition. In a simulation study, we assess the estimation accuracy of the clustering membership and the cluster patterns under a series of settings: small versus moderate number of time points; various noise levels; and varying number of subunits per unit. We demonstrate the applicability of the clustering analysis to a real data set consisting of expression profiles from genes activated by immunity system cells. Prevalent response patterns are identified by clustering the expression profiles using our multilevel clustering analysis.  相似文献   

2.
3.
4.
5.
Summary We discuss design and analysis of longitudinal studies after case–control sampling, wherein interest is in the relationship between a longitudinal binary response that is related to the sampling (case–control) variable, and a set of covariates. We propose a semiparametric modeling framework based on a marginal longitudinal binary response model and an ancillary model for subjects' case–control status. In this approach, the analyst must posit the population prevalence of being a case, which is then used to compute an offset term in the ancillary model. Parameter estimates from this model are used to compute offsets for the longitudinal response model. Examining the impact of population prevalence and ancillary model misspecification, we show that time‐invariant covariate parameter estimates, other than the intercept, are reasonably robust, but intercept and time‐varying covariate parameter estimates can be sensitive to such misspecification. We study design and analysis issues impacting study efficiency, namely: choice of sampling variable and the strength of its relationship to the response, sample stratification, choice of working covariance weighting, and degree of flexibility of the ancillary model. The research is motivated by a longitudinal study following case–control sampling of the time course of attention deficit hyperactivity disorder (ADHD) symptoms.  相似文献   

6.
7.
Summary The rapid development of new biotechnologies allows us to deeply understand biomedical dynamic systems in more detail and at a cellular level. Many of the subject‐specific biomedical systems can be described by a set of differential or difference equations that are similar to engineering dynamic systems. In this article, motivated by HIV dynamic studies, we propose a class of mixed‐effects state‐space models based on the longitudinal feature of dynamic systems. State‐space models with mixed‐effects components are very flexible in modeling the serial correlation of within‐subject observations and between‐subject variations. The Bayesian approach and the maximum likelihood method for standard mixed‐effects models and state‐space models are modified and investigated for estimating unknown parameters in the proposed models. In the Bayesian approach, full conditional distributions are derived and the Gibbs sampler is constructed to explore the posterior distributions. For the maximum likelihood method, we develop a Monte Carlo EM algorithm with a Gibbs sampler step to approximate the conditional expectations in the E‐step. Simulation studies are conducted to compare the two proposed methods. We apply the mixed‐effects state‐space model to a data set from an AIDS clinical trial to illustrate the proposed methodologies. The proposed models and methods may also have potential applications in other biomedical system analyses such as tumor dynamics in cancer research and genetic regulatory network modeling.  相似文献   

8.
9.
10.
Summary This article introduces new methods for performing classification of complex, high‐dimensional functional data using the functional mixed model (FMM) framework. The FMM relates a functional response to a set of predictors through functional fixed and random effects, which allows it to account for various factors and between‐function correlations. The methods include training and prediction steps. In the training steps we train the FMM model by treating class designation as one of the fixed effects, and in the prediction steps we classify the new objects using posterior predictive probabilities of class. Through a Bayesian scheme, we are able to adjust for factors affecting both the functions and the class designations. While the methods can be used in any FMM framework, we provide details for two specific Bayesian approaches: the Gaussian, wavelet‐based FMM (G‐WFMM) and the robust, wavelet‐based FMM (R‐WFMM). Both methods perform modeling in the wavelet space, which yields parsimonious representations for the functions, and can naturally adapt to local features and complex nonstationarities in the functions. The R‐WFMM allows potentially heavier tails for features of the functions indexed by particular wavelet coefficients, leading to a down‐weighting of outliers that makes the method robust to outlying functions or regions of functions. The models are applied to a pancreatic cancer mass spectroscopy data set and compared with other recently developed functional classification methods.  相似文献   

11.
12.
13.
The aim of the present study was to contribute to the knowledge of the essential‐oil composition of the Calamintha officinalisnepeta complex in Greece and to clarify the main patterns of its variation. The oils obtained from 22 wild‐growing populations of C. glandulosa, C. nepeta, and C. menthifolia were studied. They could be classified into two different chemotypes, which correspond to the main biosynthetic routes of the C(3)‐oxygenated p‐menthane compounds. Chemotype I includes oils rich in trans‐piperitone oxide, cis‐piperitone oxide, and piperitenone oxide, while Chemotype II comprises oils rich in pulegone and menthone or menthone and isomenthone. Within both chemotypes, quantitative fluctuations of the main components were observed. Comparison with published data showed that the presence of Chemotype II has not been observed before in C. menthifolia, while Chemotype I has been reported in C. nepeta plants from Greece for the first time.  相似文献   

14.
Analysis of longitudinal data with excessive zeros has gained increasing attention in recent years; however, current approaches to the analysis of longitudinal data with excessive zeros have primarily focused on balanced data. Dropouts are common in longitudinal studies; therefore, the analysis of the resulting unbalanced data is complicated by the missing mechanism. Our study is motivated by the analysis of longitudinal skin cancer count data presented by Greenberg, Baron, Stukel, Stevens, Mandel, Spencer, Elias, Lowe, Nierenberg, Bayrd, Vance, Freeman, Clendenning, Kwan, and the Skin Cancer Prevention Study Group[New England Journal of Medicine 323 , 789–795]. The data consist of a large number of zero responses (83% of the observations) as well as a substantial amount of dropout (about 52% of the observations). To account for both excessive zeros and dropout patterns, we propose a pattern‐mixture zero‐inflated model with compound Poisson random effects for the unbalanced longitudinal skin cancer data. We also incorporate an autoregressive of order 1 correlation structure in the model to capture longitudinal correlation of the count responses. A quasi‐likelihood approach has been developed in the estimation of our model. We illustrated the method with analysis of the longitudinal skin cancer data.  相似文献   

15.
This is the first report on the composition and variability of the needle‐wax n‐alkanes in natural populations of Pinus nigra in Serbia. Samples of 195 trees from seven populations belonging to several infraspecific taxa (ssp. nigra, var. gocensis, ssp. pallasiana, and var. banatica) were analyzed. In general, the size of the n‐alkanes ranged from C16 to C33, with the exception of ssp. nigra, for which it ranged from C18 to C33. The most abundant were C23‐, C25‐, C27‐, and C29‐alkanes. The needle waxes of Populations IIII and V were characterized by a higher content of C23‐, C25‐, and C27‐alkanes and a lower content of C24‐, C26‐, C28‐, and C30‐alkanes, compared to the other populations, and the trees of these populations could be assigned to ssp. nigra. The samples of Population VI were characterized by higher amounts of C22‐, C24‐, C30‐, and C32‐alkanes and lower amounts of C25‐ and C27‐alkanes, and the trees could be considered as ssp. pallasiana. The samples of Population VII, consisting of trees belonging to var. banatica, were richer in C29‐, C31‐, and C33‐alkanes. The wax compositions of Populations IV and V, both composed of trees previously determined as P. nigra var. gocensis, showed a tendency of splitting. Indeed, the alkane composition of Population IV was closer to that of ssp. pallasiana pines, while that of Population V was more similar to that of ssp. nigra pines. From the results presented here, it is obvious that in the central part of the Balkan Peninsula, significant diversification and differentiation of the populations of black pine exists, and these populations could be defined as different intraspecific taxa. Our results also indicate the validity of n‐alkanes as chemotaxonomic characters within this aggregate.  相似文献   

16.
Summary Second‐generation sequencing (sec‐gen) technology can sequence millions of short fragments of DNA in parallel, making it capable of assembling complex genomes for a small fraction of the price and time of previous technologies. In fact, a recently formed international consortium, the 1000 Genomes Project, plans to fully sequence the genomes of approximately 1200 people. The prospect of comparative analysis at the sequence level of a large number of samples across multiple populations may be achieved within the next five years. These data present unprecedented challenges in statistical analysis. For instance, analysis operates on millions of short nucleotide sequences, or reads—strings of A,C,G, or T's, between 30 and 100 characters long—which are the result of complex processing of noisy continuous fluorescence intensity measurements known as base‐calling. The complexity of the base‐calling discretization process results in reads of widely varying quality within and across sequence samples. This variation in processing quality results in infrequent but systematic errors that we have found to mislead downstream analysis of the discretized sequence read data. For instance, a central goal of the 1000 Genomes Project is to quantify across‐sample variation at the single nucleotide level. At this resolution, small error rates in sequencing prove significant, especially for rare variants. Sec‐gen sequencing is a relatively new technology for which potential biases and sources of obscuring variation are not yet fully understood. Therefore, modeling and quantifying the uncertainty inherent in the generation of sequence reads is of utmost importance. In this article, we present a simple model to capture uncertainty arising in the base‐calling procedure of the Illumina/Solexa GA platform. Model parameters have a straightforward interpretation in terms of the chemistry of base‐calling allowing for informative and easily interpretable metrics that capture the variability in sequencing quality. Our model provides these informative estimates readily usable in quality assessment tools while significantly improving base‐calling performance.  相似文献   

17.
18.
19.
In an industrial production environment, cultivation processes for the production of recombinant proteins run along predefined trajectories. Feedback control is the best way to keep the cultures on track. However, feedback controllers require accurate on‐line values of the controlled variables. To assess whether the measurement signals are correct, process supervision techniques are required. In the case where a process failure has occurred and incorrectly measured variables have been identified, automated fail‐safe techniques must be started. Here, we use the production of a pharmaceutically relevant recombinant protein to compare different approaches to process supervision and fail‐safe routines.  相似文献   

20.
Na Cai  Wenbin Lu  Hao Helen Zhang 《Biometrics》2012,68(4):1093-1102
Summary In analysis of longitudinal data, it is not uncommon that observation times of repeated measurements are subject‐specific and correlated with underlying longitudinal outcomes. Taking account of the dependence between observation times and longitudinal outcomes is critical under these situations to assure the validity of statistical inference. In this article, we propose a flexible joint model for longitudinal data analysis in the presence of informative observation times. In particular, the new procedure considers the shared random‐effect model and assumes a time‐varying coefficient for the latent variable, allowing a flexible way of modeling longitudinal outcomes while adjusting their association with observation times. Estimating equations are developed for parameter estimation. We show that the resulting estimators are consistent and asymptotically normal, with variance–covariance matrix that has a closed form and can be consistently estimated by the usual plug‐in method. One additional advantage of the procedure is that it provides a unified framework to test whether the effect of the latent variable is zero, constant, or time‐varying. Simulation studies show that the proposed approach is appropriate for practical use. An application to a bladder cancer data is also given to illustrate the methodology.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号