首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Longitudinal data often encounter missingness with monotone and/or intermittent missing patterns. Multiple imputation (MI) has been popularly employed for analysis of missing longitudinal data. In particular, the MI‐GEE method has been proposed for inference of generalized estimating equations (GEE) when missing data are imputed via MI. However, little is known about how to perform model selection with multiply imputed longitudinal data. In this work, we extend the existing GEE model selection criteria, including the “quasi‐likelihood under the independence model criterion” (QIC) and the “missing longitudinal information criterion” (MLIC), to accommodate multiple imputed datasets for selection of the MI‐GEE mean model. According to real data analyses from a schizophrenia study and an AIDS study, as well as simulations under nonmonotone missingness with moderate proportion of missing observations, we conclude that: (i) more than a few imputed datasets are required for stable and reliable model selection in MI‐GEE analysis; (ii) the MI‐based GEE model selection methods with a suitable number of imputations generally perform well, while the naive application of existing model selection methods by simply ignoring missing observations may lead to very poor performance; (iii) the model selection criteria based on improper (frequentist) multiple imputation generally performs better than their analogies based on proper (Bayesian) multiple imputation.  相似文献   

2.
The current standard biomarker for myocardial infarction (MI) is high‐sensitive troponin. Although powerful in clinical setting, search for new markers is warranted as early diagnosis of MI is associated with improved outcomes. Extracellular vesicles (EVs) attracted considerable interest as new blood biomarkers. A training cohort used for diagnostic modelling included 30 patients with STEMI, 38 with stable angina (SA) and 30 matched‐controls. Extracellular vesicle concentration was assessed by nanoparticle tracking analysis. Extracellular vesicle surface‐epitopes were measured by flow cytometry. Diagnostic models were developed using machine learning algorithms and validated on an independent cohort of 80 patients. Serum EV concentration from STEMI patients was increased as compared to controls and SA. EV levels of CD62P, CD42a, CD41b, CD31 and CD40 increased in STEMI, and to a lesser extent in SA patients. An aggregate marker including EV concentration and CD62P/CD42a levels achieved non‐inferiority to troponin, discriminating STEMI from controls (AUC = 0.969). A random forest model based on EV biomarkers discriminated the two groups with 100% accuracy. EV markers and RF model confirmed high diagnostic performance at validation. In conclusion, patients with acute MI or SA exhibit characteristic EV biomarker profiles. EV biomarkers hold great potential as early markers for the management of patients with MI.  相似文献   

3.
An MS-based metabolomics strategy including variable selection and PLSDA analysis has been assessed as a tool to discriminate between non-steatotic and steatotic human liver profiles. Different chemometric approaches for uninformative variable elimination were performed by using two of the most common software packages employed in the field of metabolomics (i.e., MATLAB and SIMCA-P). The first considered approach was performed with MATLAB where the PLS regression vector coefficient values were used to classify variables as informative or not. The second approach was run under SIMCA-P, where variable selection was performed according to both the PLS regression vector coefficients and VIP scores. PLSDA models performance features, such as model validation, variable selection criteria, and potential biomarker output, were assessed for comparison purposes. One interesting finding is that variable selection improved the classification predictiveness of all the models by facilitating metabolite identification and providing enhanced insight into the metabolic information acquired by the UPLC-MS method. The results prove that the proposed strategy is a potentially straightforward approach to improve model performance. Among others, GSH, lysophospholipids and bile acids were found to be the most important altered metabolites in the metabolomic profiles studied. However, further research and more in-depth biochemical interpretations are needed to unambiguously propose them as disease biomarkers.  相似文献   

4.
Man Jin  Yixin Fang 《Biometrics》2011,67(1):124-132
Summary In family studies, canonical discriminant analysis can be used to find linear combinations of phenotypes that exhibit high ratios of between‐family to within‐family variabilities. But with large numbers of phenotypes, canonical discriminant analysis may overfit. To estimate the predicted ratios associated with the coefficients obtained from canonical discriminant analysis, two methods are developed; one is based on bias correction and the other based on cross‐validation. Because the cross‐validation is computationally intensive, an approximation to the cross‐validation is also developed. Furthermore, these methods can be applied to perform variable selection in canonical discriminant analysis. The proposed methods are illustrated with simulation studies and applications to two real examples.  相似文献   

5.
MOTIVATION: An important application of microarrays is to discover genomic biomarkers, among tens of thousands of genes assayed, for disease classification. Thus there is a need for developing statistical methods that can efficiently use such high-throughput genomic data, select biomarkers with discriminant power and construct classification rules. The ROC (receiver operator characteristic) technique has been widely used in disease classification with low-dimensional biomarkers because (1) it does not assume a parametric form of the class probability as required for example in the logistic regression method; (2) it accommodates case-control designs and (3) it allows treating false positives and false negatives differently. However, due to computational difficulties, the ROC-based classification has not been used with microarray data. Moreover, the standard ROC technique does not incorporate built-in biomarker selection. RESULTS: We propose a novel method for biomarker selection and classification using the ROC technique for microarray data. The proposed method uses a sigmoid approximation to the area under the ROC curve as the objective function for classification and the threshold gradient descent regularization method for estimation and biomarker selection. Tuning parameter selection based on the V-fold cross validation and predictive performance evaluation are also investigated. The proposed approach is demonstrated with a simulation study, the Colon data and the Estrogen data. The proposed approach yields parsimonious models with excellent classification performance.  相似文献   

6.
Ye W  Lin X  Taylor JM 《Biometrics》2008,64(4):1238-1246
SUMMARY: In this article we investigate regression calibration methods to jointly model longitudinal and survival data using a semiparametric longitudinal model and a proportional hazards model. In the longitudinal model, a biomarker is assumed to follow a semiparametric mixed model where covariate effects are modeled parametrically and subject-specific time profiles are modeled nonparametrially using a population smoothing spline and subject-specific random stochastic processes. The Cox model is assumed for survival data by including both the current measure and the rate of change of the underlying longitudinal trajectories as covariates, as motivated by a prostate cancer study application. We develop a two-stage semiparametric regression calibration (RC) method. Two variations of the RC method are considered, risk set regression calibration and a computationally simpler ordinary regression calibration. Simulation results show that the two-stage RC approach performs well in practice and effectively corrects the bias from the naive method. We apply the proposed methods to the analysis of a dataset for evaluating the effects of the longitudinal biomarker PSA on the recurrence of prostate cancer.  相似文献   

7.
We propose criteria for variable selection in the mean model and for the selection of a working correlation structure in longitudinal data with dropout missingness using weighted generalized estimating equations. The proposed criteria are based on a weighted quasi‐likelihood function and a penalty term. Our simulation results show that the proposed criteria frequently select the correct model in candidate mean models. The proposed criteria also have good performance in selecting the working correlation structure for binary and normal outcomes. We illustrate our approaches using two empirical examples. In the first example, we use data from a randomized double‐blind study to test the cancer‐preventing effects of beta carotene. In the second example, we use longitudinal CD4 count data from a randomized double‐blind study.  相似文献   

8.
This paper discusses Bayesian statistical methods for the classification of observations into two or more groups based on hierarchical models for nonlinear longitudinal profiles. Parameter estimation for a discriminant model that classifies individuals into distinct predefined groups or populations uses appropriate posterior simulation schemes. The methods are illustrated with data from a study involving 173 pregnant women. The main objective in this study is to predict normal versus abnormal pregnancy outcomes from beta human chorionic gonadotropin data available at early stages of pregnancy.  相似文献   

9.
Often in biomedical studies, the routine use of linear mixed‐effects models (based on Gaussian assumptions) can be questionable when the longitudinal responses are skewed in nature. Skew‐normal/elliptical models are widely used in those situations. Often, those skewed responses might also be subjected to some upper and lower quantification limits (QLs; viz., longitudinal viral‐load measures in HIV studies), beyond which they are not measurable. In this paper, we develop a Bayesian analysis of censored linear mixed models replacing the Gaussian assumptions with skew‐normal/independent (SNI) distributions. The SNI is an attractive class of asymmetric heavy‐tailed distributions that includes the skew‐normal, skew‐t, skew‐slash, and skew‐contaminated normal distributions as special cases. The proposed model provides flexibility in capturing the effects of skewness and heavy tail for responses that are either left‐ or right‐censored. For our analysis, we adopt a Bayesian framework and develop a Markov chain Monte Carlo algorithm to carry out the posterior analyses. The marginal likelihood is tractable, and utilized to compute not only some Bayesian model selection measures but also case‐deletion influence diagnostics based on the Kullback–Leibler divergence. The newly developed procedures are illustrated with a simulation study as well as an HIV case study involving analysis of longitudinal viral loads.  相似文献   

10.
A vast number of human cell lines are available for cell culture model‐based studies, and as such the potential exists for discrepancies in findings due to cell line selection. To investigate this concept, the authors determine the relative protein abundance profiles of a panel of eight diverse, but commonly studied human cell lines. This panel includes HAP1, HEK293T, HeLa, HepG2, Jurkat, Panc1, SH‐SY5Y, and SVGp12. A mass spectrometry‐based proteomics workflow designed to enhance quantitative accuracy while maintaining analytical depth is used. To this end, this strategy leverages TMTpro16‐based sample multiplexing, high‐field asymmetric ion mobility spectrometry, and real‐time database searching. The data show that the differences in the relative protein abundance profiles reflect cell line diversity. The authors also determine several hundred proteins to be highly enriched for a given cell line, and perform gene ontology and pathway analysis on these cell line‐enriched proteins. An R Shiny application is designed to query protein abundance profiles and retrieve proteins with similar patterns. The workflows used herein can be applied to additional cell lines to aid cell line selection for addressing a given scientific inquiry or for improving an experimental design.  相似文献   

11.
Longitudinal data are common in clinical trials and observational studies, where missing outcomes due to dropouts are always encountered. Under such context with the assumption of missing at random, the weighted generalized estimating equation (WGEE) approach is widely adopted for marginal analysis. Model selection on marginal mean regression is a crucial aspect of data analysis, and identifying an appropriate correlation structure for model fitting may also be of interest and importance. However, the existing information criteria for model selection in WGEE have limitations, such as separate criteria for the selection of marginal mean and correlation structures, unsatisfactory selection performance in small‐sample setups, and so forth. In particular, there are few studies to develop joint information criteria for selection of both marginal mean and correlation structures. In this work, by embedding empirical likelihood into the WGEE framework, we propose two innovative information criteria named a joint empirical Akaike information criterion and a joint empirical Bayesian information criterion, which can simultaneously select the variables for marginal mean regression and also correlation structure. Through extensive simulation studies, these empirical‐likelihood‐based criteria exhibit robustness, flexibility, and outperformance compared to the other criteria including the weighted quasi‐likelihood under the independence model criterion, the missing longitudinal information criterion, and the joint longitudinal information criterion. In addition, we provide a theoretical justification of our proposed criteria, and present two real data examples in practice for further illustration.  相似文献   

12.
The availability of reliable biomarkers of brain injury secondary to birth asphyxia could substantially improve clinical grading, therapeutic intervention strategies, and prognosis. In this study, changes in the metabolome of retinal tissue caused by profound hypoxia in an established neonatal piglet model were investigated using an ultra performance liquid chromatography – quadrupole time of flight mass spectrometry (UPLC-QTOFMS) untargeted metabolomic approach, which included Partial Least Squares – Discriminant Analysis (PLSDA) multivariate data analysis. The initial identification of a set of discriminant metabolites from UPLC-QTOFMS data was confirmed by target UPLC-MS/MS and allowed the selection of endogenous CDP-choline as a promising candidate biomarker for hypoxia-derived brain damage assessing intensity of retinal hypoxia. Results from this study will foster further research on CDP-choline changes occurring during resuscitation.  相似文献   

13.
This work develops a joint model selection criterion for simultaneously selecting the marginal mean regression and the correlation/covariance structure in longitudinal data analysis where both the outcome and the covariate variables may be subject to general intermittent patterns of missingness under the missing at random mechanism. The new proposal, termed “joint longitudinal information criterion” (JLIC), is based on the expected quadratic error for assessing model adequacy, and the second‐order weighted generalized estimating equation (WGEE) estimation for mean and covariance models. Simulation results reveal that JLIC outperforms existing methods performing model selection for the mean regression and the correlation structure in a two stage and hence separate manner. We apply the proposal to a longitudinal study to identify factors associated with life satisfaction in the elderly of Taiwan.  相似文献   

14.
  • 1 A previous study suggested the use of certain insects groups as indicators for detecting organic olive farming in Southern Spain. To validate the use of these groups, insects were collected from olive orchards in Cordoba and Granada, comprising two Andalusian provinces with different surrounding landscapes.
  • 2 Canopies were sampled using the branch‐beating technique during pre‐blooming and post‐blooming periods over 3 years in Granada (1999, 2000 and 2003) and 1 year in Cordoba (2003).
  • 3 Using a nonparametric linear discriminant analysis method, based on the k‐nearest neighbour algorithm, two discriminant functions were constructed. A first discriminant model took into account interannual variability in Granada Province and the second model focused on environmental heterogeneity between the two provinces. Cross‐validation techniques, such as leave‐one‐out and split‐sample, were applied to the associated discriminant functions for each model to check their performance.
  • 4 Even though differences existed with respect to the insect composition of the regions, the second model correctly classified 78.1% of the sampled blocks under the non‐organic and organic farming systems at the same time as taking into account two orders: Coleoptera and Hemiptera [excluding Euphyllura olivina olivina (Psyllidae) and the Heteroptera suborder]. The results suggest that the relative abundance of these groups in the post‐blooming period could constitute a potential bio‐indicator of organic olive farming system.
  相似文献   

15.
16.
The computer program INDEP-SELECT has been developed for selection of an optimal subset from a set of possibly informative diagnostic or prognostic variables. But the program is equally useful for other discriminant analysis or pattern recognition problems involving variable selection. The approach is probabilistic; i.e., diagnostic probabilities are assigned to a patient on the basis of the values observed on the diagnostic variables.The statistical model used is largely based on the assumption of independency between the variables, but one model-parameter, the so-called ‘global association factor’, is added in order to take dependency into account. The stepwise forward selection strategy of adding in each selection step a new variable to the set of already selected variables, is used. The user may choose between a number of selection criteria. Such a criterion is used in order to decide in each selection step which variable should be added.All criteria are based on measures of diagnostic or prognostic performance. INDEP-SELECT is able to handle a large number of variables, also with missing data, and a large number of patients. The program is written in ANS Standard FORTRAN, and takes relatively little computation time.  相似文献   

17.
Summary High‐dimensional data such as microarrays have brought us new statistical challenges. For example, using a large number of genes to classify samples based on a small number of microarrays remains a difficult problem. Diagonal discriminant analysis, support vector machines, and k‐nearest neighbor have been suggested as among the best methods for small sample size situations, but none was found to be superior to others. In this article, we propose an improved diagonal discriminant approach through shrinkage and regularization of the variances. The performance of our new approach along with the existing methods is studied through simulations and applications to real data. These studies show that the proposed shrinkage‐based and regularization diagonal discriminant methods have lower misclassification rates than existing methods in many cases.  相似文献   

18.
Until recently, numerous feature selection techniques have been proposed and found wide applications in genomics and proteomics. For instance, feature/gene selection has proven to be useful for biomarker discovery from microarray and mass spectrometry data. While supervised feature selection has been explored extensively, there are only a few unsupervised methods that can be applied to exploratory data analysis. In this paper, we address the problem of unsupervised feature selection. First, we extend Laplacian linear discriminant analysis (LLDA) to unsupervised cases. Second, we propose a novel algorithm for computing LLDA, which is efficient in the case of high dimensionality and small sample size as in microarray data. Finally, an unsupervised feature selection method, called LLDA-based Recursive Feature Elimination (LLDA-RFE), is proposed. We apply LLDA-RFE to several public data sets of cancer microarrays and compare its performance with those of Laplacian score and SVD-entropy, two state-of-the-art unsupervised methods, and with that of Fisher score, a supervised filter method. Our results demonstrate that LLDA-RFE outperforms Laplacian score and shows favorable performance against SVD-entropy. It performs even better than Fisher score for some of the data sets, despite the fact that LLDA-RFE is fully unsupervised.  相似文献   

19.
The purpose of this study was to use metabonomic profiling to identify a potential specific biomarker pattern in urine as a noninvasive bladder cancer (BC) detection strategy. A liquid chromatography-mass spectrometry based method, which utilized both reversed phase liquid chromatography and hydrophilic interaction chromatography separations, was performed, followed by multivariate data analysis to discriminate the global urine profiles of 27 BC patients and 32 healthy controls. Data from both columns were combined, and this combination proved to be effective and reliable for partial least squares-discriminant analysis. Following a critical selection criterion, several metabolites showing significant differences in expression levels were detected. Receiver operating characteristic analysis was used for the evaluation of potential biomarkers. Carnitine C9:1 and component I, were combined as a biomarker pattern, with a sensitivity and specificity up to 92.6% and 96.9%, respectively, for all patients and 90.5% and 96.9%, respectively for low-grade BC patients. Metabolic pathways of component I and carnitine C9:1 are discussed. These results indicate that metabonomics is a practicable tool for BC diagnosis given its high efficacy and economization. The combined biomarker pattern showed better performance than single metabolite in discriminating bladder cancer patients, especially low-grade BC patients, from healthy controls.  相似文献   

20.
The statistical analysis of array comparative genomic hybridization (CGH) data has now shifted to the joint assessment of copy number variations at the cohort level. Considering multiple profiles gives the opportunity to correct for systematic biases observed on single profiles, such as probe GC content or the so-called "wave effect." In this article, we extend the segmentation model developed in the univariate case to the joint analysis of multiple CGH profiles. Our contribution is multiple: we propose an integrated model to perform joint segmentation, normalization, and calling for multiple array CGH profiles. This model shows great flexibility, especially in the modeling of the wave effect that gives a likelihood framework to approaches proposed by others. We propose a new dynamic programming algorithm for break point positioning, as well as a model selection criterion based on a modified bayesian information criterion proposed in the univariate case. The performance of our method is assessed using simulated and real data sets. Our method is implemented in the R package cghseg.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号