共查询到20条相似文献,搜索用时 23 毫秒
1.
Functional principal component analysis (FPCA) has been widely used to capture major modes of variation and reduce dimensions in functional data analysis. However, standard FPCA based on the sample covariance estimator does not work well if the data exhibits heavy-tailedness or outliers. To address this challenge, a new robust FPCA approach based on a functional pairwise spatial sign (PASS) operator, termed PASS FPCA, is introduced. We propose robust estimation procedures for eigenfunctions and eigenvalues. Theoretical properties of the PASS operator are established, showing that it adopts the same eigenfunctions as the standard covariance operator and also allows recovering ratios between eigenvalues. We also extend the proposed procedure to handle functional data measured with noise. Compared to existing robust FPCA approaches, the proposed PASS FPCA requires weaker distributional assumptions to conserve the eigenspace of the covariance function. Specifically, existing work are often built upon a class of functional elliptical distributions, which requires inherently symmetry. In contrast, we introduce a class of distributions called the weakly functional coordinate symmetry (weakly FCS), which allows for severe asymmetry and is much more flexible than the functional elliptical distribution family. The robustness of the PASS FPCA is demonstrated via extensive simulation studies, especially its advantages in scenarios with nonelliptical distributions. The proposed method was motivated by and applied to analysis of accelerometry data from the Objective Physical Activity and Cardiovascular Health Study, a large-scale epidemiological study to investigate the relationship between objectively measured physical activity and cardiovascular health among older women. 相似文献
2.
3.
4.
Pairwise curve synchronization for functional data 总被引:1,自引:0,他引:1
Data collected by scientists are increasingly in the form oftrajectories or curves. Often these can be viewed as realizationsof a composite process driven by both amplitude and time variation.We consider the situation in which functional variation is dominatedby time variation, and develop a curve-synchronization methodthat uses every trajectory in the sample as a reference to obtainpairwise warping functions in the first step. These initialpairwise warping functions are then used to create improvedestimators of the underlying individual warping functions inthe second step. A truncated averaging process is used to obtainrobust estimation of individual warping functions. The methodcompares well with other available time-synchronization approachesand is illustrated with Berkeley growth data and gene expressiondata for multiple sclerosis. 相似文献
5.
Motivated by recent work involving the analysis of biomedical imaging data, we present a novel procedure for constructing simultaneous confidence corridors for the mean of imaging data. We propose to use flexible bivariate splines over triangulations to handle an irregular domain of the images that is common in brain imaging studies and in other biomedical imaging applications. The proposed spline estimators of the mean functions are shown to be consistent and asymptotically normal under some regularity conditions. We also provide a computationally efficient estimator of the covariance function and derive its uniform consistency. The procedure is also extended to the two-sample case in which we focus on comparing the mean functions from two populations of imaging data. Through Monte Carlo simulation studies, we examine the finite sample performance of the proposed method. Finally, the proposed method is applied to analyze brain positron emission tomography data in two different studies. One data set used in preparation of this article was obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. 相似文献
6.
Principal component models for sparse functional data 总被引:5,自引:0,他引:5
7.
Screening mammography aims to identify breast cancer early and secondarily measures breast density to classify women at higher or lower than average risk for future breast cancer in the general population. Despite the strong association of individual mammography features to breast cancer risk, the statistical literature on mammogram imaging data is limited. While functional principal component analysis (FPCA) has been studied in the literature for extracting image-based features, it is conducted independently of the time-to-event response variable. With the consideration of building a prognostic model for precision prevention, we present a set of flexible methods, supervised FPCA (sFPCA) and functional partial least squares (FPLS), to extract image-based features associated with the failure time while accommodating the added complication from right censoring. Throughout the article, we hope to demonstrate that one method is favored over the other under different clinical setups. The proposed methods are applied to the motivating data set from the Joanne Knight Breast Health cohort at Siteman Cancer Center. Our approaches not only obtain the best prediction performance compared to the benchmark model, but also reveal different risk patterns within the mammograms. 相似文献
8.
Summary In this article, we investigate clustering methods for multilevel functional data, which consist of repeated random functions observed for a large number of units (e.g., genes) at multiple subunits (e.g., bacteria types). To describe the within- and between variability induced by the hierarchical structure in the data, we take a multilevel functional principal component analysis (MFPCA) approach. We develop and compare a hard clustering method applied to the scores derived from the MFPCA and a soft clustering method using an MFPCA decomposition. In a simulation study, we assess the estimation accuracy of the clustering membership and the cluster patterns under a series of settings: small versus moderate number of time points; various noise levels; and varying number of subunits per unit. We demonstrate the applicability of the clustering analysis to a real data set consisting of expression profiles from genes activated by immunity system cells. Prevalent response patterns are identified by clustering the expression profiles using our multilevel clustering analysis. 相似文献
9.
Zhihao Wang Yongxin Bai Wolfgang K. Härdle Maozai Tian 《Biometrical journal. Biometrische Zeitschrift》2023,65(7):2200060
Practitioners of current data analysis are regularly confronted with the situation where the heavy-tailed skewed response is related to both multiple functional predictors and high-dimensional scalar covariates. We propose a new class of partially functional penalized convolution-type smoothed quantile regression to characterize the conditional quantile level between a scalar response and predictors of both functional and scalar types. The new approach overcomes the lack of smoothness and severe convexity of the standard quantile empirical loss, considerably improving the computing efficiency of partially functional quantile regression. We investigate a folded concave penalized estimator for simultaneous variable selection and estimation by the modified local adaptive majorize-minimization (LAMM) algorithm. The functional predictors can be dense or sparse and are approximated by the principal component basis. Under mild conditions, the consistency and oracle properties of the resulting estimators are established. Simulation studies demonstrate a competitive performance against the partially functional standard penalized quantile regression. A real application using Alzheimer's Disease Neuroimaging Initiative data is utilized to illustrate the practicality of the proposed model. 相似文献
10.
Gaussian process functional regression modeling for batch data 总被引:2,自引:0,他引:2
A Gaussian process functional regression model is proposed for the analysis of batch data. Covariance structure and mean structure are considered simultaneously, with the covariance structure modeled by a Gaussian process regression model and the mean structure modeled by a functional regression model. The model allows the inclusion of covariates in both the covariance structure and the mean structure. It models the nonlinear relationship between a functional output variable and a set of functional and nonfunctional covariates. Several applications and simulation studies are reported and show that the method provides very good results for curve fitting and prediction. 相似文献
11.
12.
13.
Emerging integrative analysis of genomic and anatomical imaging data which has not been well developed, provides invaluable information for the holistic discovery of the genomic structure of disease and has the potential to open a new avenue for discovering novel disease susceptibility genes which cannot be identified if they are analyzed separately. A key issue to the success of imaging and genomic data analysis is how to reduce their dimensions. Most previous methods for imaging information extraction and RNA-seq data reduction do not explore imaging spatial information and often ignore gene expression variation at the genomic positional level. To overcome these limitations, we extend functional principle component analysis from one dimension to two dimensions (2DFPCA) for representing imaging data and develop a multiple functional linear model (MFLM) in which functional principal scores of images are taken as multiple quantitative traits and RNA-seq profile across a gene is taken as a function predictor for assessing the association of gene expression with images. The developed method has been applied to image and RNA-seq data of ovarian cancer and kidney renal clear cell carcinoma (KIRC) studies. We identified 24 and 84 genes whose expressions were associated with imaging variations in ovarian cancer and KIRC studies, respectively. Our results showed that many significantly associated genes with images were not differentially expressed, but revealed their morphological and metabolic functions. The results also demonstrated that the peaks of the estimated regression coefficient function in the MFLM often allowed the discovery of splicing sites and multiple isoforms of gene expressions. 相似文献
14.
A recurring objective in longitudinal studies on aging and longevity has been the investigation of the relationship between age-at-death and current values of a longitudinal covariate trajectory that quantifies reproductive or other behavioral activity. We propose a novel technique for predicting age-at-death distributions for situations where an entire covariate history is included in the predictor. The predictor trajectories up to current time are represented by time-varying functional principal component scores, which are continuously updated as time progresses and are considered to be time-varying predictor variables that are entered into a class of time-varying functional regression models that we propose. We demonstrate for biodemographic data how these methods can be applied to obtain predictions for age-at-death and estimates of remaining lifetime distributions, including estimates of quantiles and of prediction intervals for remaining lifetime. Estimates and predictions are obtained for individual subjects, based on their observed behavioral trajectories, and include a dimension-reduction step that is implemented by projecting on a single index. The proposed techniques are illustrated with data on longitudinal daily egg-laying for female medflies, predicting remaining lifetime and age-at-death distributions from individual event histories observed up to current time. 相似文献
15.
16.
We propose a modelling framework to study the relationship betweentwo paired longitudinally observed variables. The data for eachvariable are viewed as smooth curves measured at discrete time-pointsplus random errors. While the curves for each variable are summarizedusing a few important principal components, the associationof the two longitudinal variables is modelled through the associationof the principal component scores. We use penalized splinesto model the mean curves and the principal component curves,and cast the proposed model into a mixed-effects model frameworkfor model fitting, prediction and inference. The proposed methodcan be applied in the difficult case in which the measurementtimes are irregular and sparse and may differ widely acrossindividuals. Use of functional principal components enhancesmodel interpretation and improves statistical and numericalstability of the parameter estimates. 相似文献
17.
18.
In this article, we propose penalized spline (P-spline)-based methods for functional mixed effects models with varying coefficients. We decompose longitudinal outcomes as a sum of several terms: a population mean function, covariates with time-varying coefficients, functional subject-specific random effects, and residual measurement error processes. Using P-splines, we propose nonparametric estimation of the population mean function, varying coefficient, random subject-specific curves, and the associated covariance function that represents between-subject variation and the variance function of the residual measurement errors which represents within-subject variation. Proposed methods offer flexible estimation of both the population- and subject-level curves. In addition, decomposing variability of the outcomes as a between- and within-subject source is useful in identifying the dominant variance component therefore optimally model a covariance function. We use a likelihood-based method to select multiple smoothing parameters. Furthermore, we study the asymptotics of the baseline P-spline estimator with longitudinal data. We conduct simulation studies to investigate performance of the proposed methods. The benefit of the between- and within-subject covariance decomposition is illustrated through an analysis of Berkeley growth data, where we identified clearly distinct patterns of the between- and within-subject covariance functions of children's heights. We also apply the proposed methods to estimate the effect of antihypertensive treatment from the Framingham Heart Study data. 相似文献
19.
This paper presents the R package BioFTF, which is a tool for statistical biodiversity assessment in the functional data analysis framework. Diversity is a key topic in many research fields; however, in the literature, it is demonstrated that the existing indices do not capture the different aspects of this concept. Thus, a main drawback is that different indicators may lead to different orderings among communities according to their biodiversity. A possible method to evaluate biodiversity consists in using diversity profiles that are curves depending on a specific parameter. In this setting, it is possible to adopt some functional instruments proposed in the literature, such as the first and second derivatives, the curvature, the radius of curvature and the arc length. Specifically, the derivatives and the curvature (or the radius of curvature) highlight any peculiar behaviour of the profiles, whereas the arc length helps in ranking curves, given the richness. Because these instruments do not solve the issue of ranking communities with different numbers of species, we propose an important methodological contribution that introduces the surface area. Indeed, this tool is a scalar measure that reflects the information provided by the biodiversity profile and allows for ordering communities with different richness. However, this approach requires mathematical skills that the average user may not have; thus, our idea is to provide a user-friendly tool for both non-statistician and statistician practitioners to measure biodiversity in a functional context. 相似文献
20.