期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

随机森林算法基本思想及其在生态学中的应用——以云南松分布模拟为例 总被引：13，自引：0，他引：13

张雷王琳琳张旭东刘世荣孙鹏森王同立《生态学报》2014,34(3):650-659

通常来讲,生态学者对于解释生态关系、描述格局和过程、进行空间或时间预测比较感兴趣。这些工作可以通过模拟输出值(响应)与一些特征值(即解释变量)的关系来实现。然而,生态数据模拟遇到了挑战,这是因为响应变量和预测变量可能是连续变量或离散变量。需要解释的生态关系通常是非线性的,并且解释变量之间具有复杂的相互作用关系。响应变量和解释变量存在缺失值并不是不常有的现象,奇异值也经常出现在生态数据中。此外,生态学者通常希望生态模型即要易于建立又易要于解释。通常是利用多种统计方法来分析处理各种各样情景中出现的独特的生态问题,这些模型包括(多元)逻辑回归、线性模型、生存模型、方差分析等等。随机森林是一个可以处理所有这些问题的有效方法。随机森林可以用来做分类、聚类、回归和生存分析、评估变量的重要性、检测数据中的奇异值、对缺失数据进行插补等。鉴于随机森林本身在算法上的优势,将就随机森林在生态学中的应用进行总结,对建模过程进行概述,并以云南松分布模拟研究为例,对其主要功能特点进行案例展示。通过对随机森林的一般术语、概念和建模思想进行介绍,有利于读者掌握本方法的应用本质,可以预见随机森林在生态学研究中将得到更多的应用和发展。相似文献

2.

Latent pattern mixture models for informative intermittent missing data in longitudinal studies

Lin H McCulloch CE Rosenheck RA 《Biometrics》2004,60(2):295-305

A frequently encountered problem in longitudinal studies is data that are missing due to missed visits or dropouts. In the statistical literature, interest has primarily focused on monotone missing data (dropout) with much less work on intermittent missing data in which a subject may return after one or more missed visits. Intermittent missing data have broader applicability that can include the frequent situation in which subjects do not have common sets of visit times or they visit at nonprescheduled times. In this article, we propose a latent pattern mixture model (LPMM), where the mixture patterns are formed from latent classes that link the longitudinal response and the missingness process. This allows us to handle arbitrary patterns of missing data embodied by subjects' visit process, and avoids the need to specify the mixture patterns a priori. One assumption of our model is that the missingness process is assumed to be conditionally independent of the longitudinal outcomes given the latent classes. We propose a noniterative approach to assess this key assumption. The LPMM is illustrated with a data set from a health service research study in which homeless people with mental illness were randomized to three different service packages and measures of homelessness were recorded at multiple time points. Our model suggests the presence of four latent classes linking subject visit patterns to homeless outcomes. 相似文献

3.

A Semiparametric Missing‐Data‐Induced Intensity Method for Missing Covariate Data in Individually Matched Case–Control Studies

Mulugeta Gebregziabher Bryan Langholz 《Biometrics》2010,66(3):845-854

Summary In individually matched case–control studies, when some covariates are incomplete, an analysis based on the complete data may result in a large loss of information both in the missing and completely observed variables. This usually results in a bias and loss of efficiency. In this article, we propose a new method for handling the problem of missing covariate data based on a missing‐data‐induced intensity approach when the missingness mechanism does not depend on case–control status and show that this leads to a generalization of the missing indicator method. We derive the asymptotic properties of the estimates from the proposed method and, using an extensive simulation study, assess the finite sample performance in terms of bias, efficiency, and 95% confidence coverage under several missing data scenarios. We also make comparisons with complete‐case analysis (CCA) and some missing data methods that have been proposed previously. Our results indicate that, under the assumption of predictable missingness, the suggested method provides valid estimation of parameters, is more efficient than CCA, and is competitive with other, more complex methods of analysis. A case–control study of multiple myeloma risk and a polymorphism in the receptor Inter‐Leukin‐6 (IL‐6‐α) is used to illustrate our findings. 相似文献

4.

Extended generalized estimating equations for binary familial data with incomplete families

FitzGerald PE 《Biometrics》2002,58(4):718-726

In this article, we assess the performance of two standard, but naive, methods for handling incomplete familial data in GEE2 analyses when the outcome is binary. We also propose a new method for analyzing such data using GEE2 when explanatory variables are discrete. Unlike the naive methods, the new method does not require the missing data process to be ignorable. We illustrate our method with an example that examines the familial aggregation of obesity. 相似文献

5.

Compliance Mixture Modelling with a Zero‐Effect Complier Class and Missing Data

Michael E. Sobel Bengt Muthén 《Biometrics》2012,68(4):1037-1045

Summary Randomized experiments are the gold standard for evaluating proposed treatments. The intent to treat estimand measures the effect of treatment assignment, but not the effect of treatment if subjects take treatments to which they are not assigned. The desire to estimate the efficacy of the treatment in this case has been the impetus for a substantial literature on compliance over the last 15 years. In papers dealing with this issue, it is typically assumed there are different types of subjects, for example, those who will follow treatment assignment (compliers), and those who will always take a particular treatment irrespective of treatment assignment. The estimands of primary interest are the complier proportion and the complier average treatment effect (CACE). To estimate CACE, researchers have used various methods, for example, instrumental variables and parametric mixture models, treating compliers as a single class. However, it is often unreasonable to believe all compliers will be affected. This article therefore treats compliers as a mixture of two types, those belonging to a zero‐effect class, others to an effect class. Second, in most experiments, some subjects drop out or simply do not report the value of the outcome variable, and the failure to take into account missing data can lead to biased estimates of treatment effects. Recent work on compliance in randomized experiments has addressed this issue by assuming missing data are missing at random or latently ignorable. We extend this work to the case where compliers are a mixture of types and also examine alternative types of nonignorable missing data assumptions. 相似文献

6.

Semiparametric regression estimation in the presence of dependent censoring 总被引：5，自引：0，他引：5

ROTNITZKY ANDREA; ROBINS JAMES M. 《Biometrika》1995,82(4):805-820

We propose a semiparametric estimation procedure for estimatingthe regression of an outcome Y, measured at the end of a fixedfollow-up period, on baseline explanatory variables X, measuredprior to start of follow-up, in the presence of dependent censoringgiven X. The proposed estimators are consistent when the dataare ‘missing at random’ but not ‘missing completelyat random’ (Rubin, 1976), and do not require full specificationof the complete data likelihood. Specifically, we assume thatthe probability of censoring at time t is independent of theoutcome Y conditional on the recorded history up to t of a vectorof time-dependent covariates that are correlated with Y. Ourestimators can be used to adjust for dependent censoring andnonrandom noncompliance in randomised trials studying the effectof a treatment on the mean of a response variable of interest.Even with independent censoring, our methods allow the investigatorto increase efficiency by exploiting the correlation of theoutcome with a vector of time-dependent covariates. 相似文献

7.

Nonignorable missingness in matched case-control data analyses

Cho Paik M 《Biometrics》2004,60(2):306-314

Matched case-control data analysis is often challenged by a missing covariate problem, the mishandling of which could cause bias or inefficiency. Satten and Carroll (2000, Biometrics56, 384-388) and other authors have proposed methods to handle missing covariates when the probability of missingness depends on the observed data, i.e., when data are missing at random. In this article, we propose a conditional likelihood method to handle the case when the probability of missingness depends on the unobserved covariate, i.e., when data are nonignorably missing. When the missing covariate is binary, the proposed method can be implemented using standard software. Using the Northern Manhattan Stroke Study data, we illustrate the method and discuss how sensitivity analysis can be conducted. 相似文献

8.

Estimation of Residual Valve Gradient from Incomplete Data with Outliers

Chao L. Chen Javier Fernandez Lynn B. McGrath 《Biometrical journal. Biometrische Zeitschrift》1997,39(4):495-507

An important indicator for the long-term recovery after valve replacement surgery is postoperative valve gradient. This information is available only for patients received catheterization or echocardiogram postoperatively. It is plausible that sicker patients are more inclined to undergo these postoperative procedures and their valve gradients tend to be higher. Under this situation, ignoring the missing values and using sample mean based on the available information as an estimate of the whole study population leads to overestimation. Regression estimator is a reasonable choice to eliminate this bias if independent (explanatory) variables closely associated with both residual valve gradient and non-response mechanism can be identified. Using a series of patients receiving St. Jude Medical prosthetic valves, we found that valve area index can be used as an independent variable in the regression estimator. Two digressions from the standard assumptions used in linear regression, heteroscedastic trend of the error term and outliers were found in the data set. Iteratively reweighted least square (IRLS) was adopted to handle heteroscedasticity. Influence function approach was used to evaluate the sensitivity of outliers in regression estimator. Under an equal response rate mechanism, IRLS not only solves the problem of heteroscedasticity, but is also less sensitive to outliers. 相似文献

9.

The cox proportional hazards model with a continuous latent variable measured by multiple binary indicators

Larsen K 《Biometrics》2005,61(4):1049-1055

This article is motivated by the Women's Health and Aging Study, where information about physical functioning was recorded along with death information in a group of elderly women. The focus is on determining whether having difficulties in daily living tasks is accompanied by a higher mortality rate. To this end, a two-parameter logistic regression model is used for the modeling of binary questionnaire data assuming an underlying continuous latent variable, difficulty in daily living. The Cox model is used for the survival information, and the continuous latent variable is included as an explanatory variable along with other observed variables. Parameters are estimated by maximizing the likelihood for the joint distribution of the items and the time-to-event information. In addition to presenting a new statistical model, this article also illustrates the use of the model in a real data setting and addresses the more practical issues of model building, diagnostics, and parameter interpretation. 相似文献

10.

Multiple-Imputation-Based Residuals and Diagnostic Plots for Joint Models of Longitudinal and Survival Outcomes

Dimitris Rizopoulos Geert Verbeke Geert Molenberghs 《Biometrics》2010,66(1):20-29

Summary . The majority of the statistical literature for the joint modeling of longitudinal and time-to-event data has focused on the development of models that aim at capturing specific aspects of the motivating case studies. However, little attention has been given to the development of diagnostic and model-assessment tools. The main difficulty in using standard model diagnostics in joint models is the nonrandom dropout in the longitudinal outcome caused by the occurrence of events. In particular, the reference distribution of statistics, such as the residuals, in missing data settings is not directly available and complex calculations are required to derive it. In this article, we propose a multiple-imputation-based approach for creating multiple versions of the completed data set under the assumed joint model. Residuals and diagnostic plots for the complete data model can then be calculated based on these imputed data sets. Our proposals are exemplified using two real data sets. 相似文献

11.

Multiple imputation and posterior simulation for multivariate missing data in longitudinal studies

Liu M Taylor JM Belin TR 《Biometrics》2000,56(4):1157-1163

This paper outlines a multiple imputation method for handling missing data in designed longitudinal studies. A random coefficients model is developed to accommodate incomplete multivariate continuous longitudinal data. Multivariate repeated measures are jointly modeled; specifically, an i.i.d. normal model is assumed for time-independent variables and a hierarchical random coefficients model is assumed for time-dependent variables in a regression model conditional on the time-independent variables and time, with heterogeneous error variances across variables and time points. Gibbs sampling is used to draw model parameters and for imputations of missing observations. An application to data from a study of startle reactions illustrates the model. A simulation study compares the multiple imputation procedure to the weighting approach of Robins, Rotnitzky, and Zhao (1995, Journal of the American Statistical Association 90, 106-121) that can be used to address similar data structures. 相似文献

12.

Using Hierarchical Likelihood for Missing Data Problems

Yun Sung-Cheol; Lee Youngjo; Kenward Michael G. 《Biometrika》2007,94(4):905-919

Most statistical solutions to the problem of statistical inferencewith missing data involve integration or expectation. This canbe done in many ways: directly or indirectly, analytically ornumerically, deterministically or stochastically. Missing-dataproblems can be formulated in terms of latent random variables,so that hierarchical likelihood methods of Lee & Nelder(1996) can be applied to missing-value problems to provide onesolution to the problem of integration of the likelihood. Theresulting methods effectively use a Laplace approximation tothe marginal likelihood with an additional adjustment to themeasures of precision to accommodate the estimation of the fixedeffects parameters. We first consider missing at random caseswhere problems are simpler to handle because the integrationdoes not need to involve the missing-value mechanism and thenconsider missing not at random cases. We also study tobit regressionand refit the missing not at random selection model to the antidepressanttrial data analyzed in Diggle & Kenward (1994). 相似文献

13.

A note on the effect of observations with missing data on genetic correlation estimates

J. I. Weller M. Ron 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》1987,74(5):549-553

Summary Various studies have estimated covariance components as half the difference between the variance component of the sum of the variable values, for each observation, and the sum of the corresponding variable variance components. Although the variance components for the separate variables can be computed using all available data, the variance components of the sum can be computed only from those observations with records for both variables. Previous studies have suggested eliminating observations with missing data, because of possible selection bias. The effect of missing data on estimates of covariance components and genetic correlations was tested on sample beef cattle data and simulated data by randomly deleting differing proportions of records of one variable for each pair of variables analyzed. Estimates of genetic correlations computed with observations with missing data eliminated, were more accurate than estimates computed using all available data. Furthermore, when observations with missing data were included, estimates of genetic correlation far outside the parameter space were common. Therefore, this method should be used only if observations with missing data have been eliminated. 相似文献

14.

Classification trees as an alternative to linear discriminant analysis

Feldesman MR 《American journal of physical anthropology》2002,119(3):257-275

Linear discriminant analysis (LDA) is frequently used for classification/prediction problems in physical anthropology, but it is unusual to find examples where researchers consider the statistical limitations and assumptions required for this technique. In these instances, it is difficult to know whether the predictions are reliable. This paper considers a nonparametric alternative to predictive LDA: binary, recursive (or classification) trees. This approach has the advantage that data transformation is unnecessary, cases with missing predictor variables do not require special treatment, prediction success is not dependent on data meeting normality conditions or covariance homogeneity, and variable selection is intrinsic to the methodology. Here I compare the efficacy of classification trees with LDA, using typical morphometric data. With data from modern hominoids, the results show that both techniques perform nearly equally. With complete data sets, LDA may be a better choice, as is shown in this example, but with missing observations, classification trees perform outstandingly well, whereas commercial discriminant analysis programs do not predict classifications for cases with incompletely measured predictor variables and generally are not designed to address the problem of missing data. Testing of data prior to analysis is necessary, and classification trees are recommended either as a replacement for LDA or as a supplement whenever data do not meet relevant assumptions. It is highly recommended as an alternative to LDA whenever the data set contains important cases with missing predictor variables. 相似文献

15.

Estimating evolutionary parameters when viability selection is operating 总被引：2，自引：0，他引：2

Hadfield JD 《Proceedings. Biological sciences / The Royal Society》2008,275(1635):723-734

Some individuals die before a trait is measured or expressed (the invisible fraction), and some relevant traits are not measured in any individual (missing traits). This paper discusses how these concepts can be cast in terms of missing data problems from statistics. Using missing data theory, I show formally the conditions under which a valid evolutionary inference is possible when the invisible fraction and/or missing traits are ignored. These conditions are restrictive and unlikely to be met in even the most comprehensive long-term studies. When these conditions are not met, many selection and quantitative genetic parameters cannot be estimated accurately unless the missing data process is explicitly modelled. Surprisingly, this does not seem to have been attempted in evolutionary biology. In the case of the invisible fraction, viability selection and the missing data process are often intimately linked. In such cases, models used in survival analysis can be extended to provide a flexible and justified model of the missing data mechanism. Although missing traits pose a more difficult problem, important biological parameters can still be estimated without bias when appropriate techniques are used. This is in contrast to current methods which have large biases and poor precision. Generally, the quantitative genetic approach is shown to be superior to phenotypic studies of selection when invisible fractions or missing traits exist because part of the missing information can be recovered from relatives. 相似文献

16.

An extension of the Cormack-Jolly-Seber model for continuous covariates with application to Microtus pennsylvanicus

Bonner SJ Schwarz CJ 《Biometrics》2006,62(1):142-149

Recent developments in the Cormack-Jolly-Seber (CJS) model for analyzing capture-recapture data have focused on allowing the capture and survival rates to vary between individuals. Several methods have been developed in which capture and survival are functions of auxiliary variables that may be discrete, constant over time, or apply to the population as a whole, but the problem has not been solved for continuous covariates that vary with both time and individual. This article proposes a new method to handle such covariates by modeling changes over time via a diffusion process and using logistic functions to link the variable to the CJS capture and survival rates. Bayesian methods are used to estimate the model parameters. The method is applied to study the effect of body mass on the survival of the North American meadow vole, Microtus pennsylvanicus. 相似文献

17.

Multiple imputation methods for multivariate one-sided tests with missing data

Wang T Wu L 《Biometrics》2011,67(4):1452-1460

Multivariate one-sided hypotheses testing problems arise frequently in practice. Various tests have been developed. In practice, there are often missing values in multivariate data. In this case, standard testing procedures based on complete data may not be applicable or may perform poorly if the missing data are discarded. In this article, we propose several multiple imputation methods for multivariate one-sided testing problem with missing data. Some theoretical results are presented. The proposed methods are evaluated using simulations. A real data example is presented to illustrate the methods. 相似文献

18.

Partitioning the variation in a plot‐by‐species data matrix that is related to n sets of explanatory variables

Rune Halvorsen kland 《植被学杂志》2003,14(5):693-700

Abstract. Variation partitioning by (partial) constrained ordination is a popular method for exploratory data analysis, but applications are mostly restricted to simple ecological questions only involving two or three sets of explanatory variables, such as climate and soil, this because of the rapid increase in complexity of calculations and results with an increasing number of explanatory variable sets. The existence is demonstrated of a unique algorithm for partitioning the variation in a set of response variables on n sets of explanatory variables; it is shown how the 2n– 1 non‐overlapping components of variation can be calculated. Methods for evaluation and presentation of variation partitioning results are reviewed, and a recursive algorithm is proposed for distributing the many small components of variation over simpler components. Several issues related to the use and usefulness of variation partitioning with n sets of explanatory variables are discussed with reference to a worked example. 相似文献

19.

Generalizing treatment effects with incomplete covariates: Identifying assumptions and multiple imputation algorithms

Imke Mayer Julie Josse Traumabase Group 《Biometrical journal. Biometrische Zeitschrift》2023,65(5):2100294

We focus on the problem of generalizing a causal effect estimated on a randomized controlled trial (RCT) to a target population described by a set of covariates from observational data. Available methods such as inverse propensity sampling weighting are not designed to handle missing values, which are however common in both data sources. In addition to coupling the assumptions for causal effect identifiability and for the missing values mechanism and to defining appropriate estimation strategies, one difficulty is to consider the specific structure of the data with two sources and treatment and outcome only available in the RCT. We propose three multiple imputation strategies to handle missing values when generalizing treatment effects, each handling the multisource structure of the problem differently (separate imputation, joint imputation with fixed effect, joint imputation ignoring source information). As an alternative to multiple imputation, we also propose a direct estimation approach that treats incomplete covariates as semidiscrete variables. The multiple imputation strategies and the latter alternative rely on different sets of assumptions concerning the impact of missing values on identifiability. We discuss these assumptions and assess the methods through an extensive simulation study. This work is motivated by the analysis of a large registry of over 20,000 major trauma patients and an RCT studying the effect of tranexamic acid administration on mortality in major trauma patients admitted to intensive care units. The analysis illustrates how the missing values handling can impact the conclusion about the effect generalized from the RCT to the target population. 相似文献

20.

Flexible Regression Model Selection for Survival Probabilities: With Application to AIDS

A. Gregory DiRienzo 《Biometrics》2009,65(4):1194-1202

Summary Clinicians are often interested in the effect of covariates on survival probabilities at prespecified study times. Because different factors can be associated with the risk of short‐ and long‐term failure, a flexible modeling strategy is pursued. Given a set of multiple candidate working models, an objective methodology is proposed that aims to construct consistent and asymptotically normal estimators of regression coefficients and average prediction error for each working model, that are free from the nuisance censoring variable. It requires the conditional distribution of censoring given covariates to be modeled. The model selection strategy uses stepup or stepdown multiple hypothesis testing procedures that control either the proportion of false positives or generalized familywise error rate when comparing models based on estimates of average prediction error. The context can actually be cast as a missing data problem, where augmented inverse probability weighted complete case estimators of regression coefficients and prediction error can be used ( Tsiatis, 2006 , Semiparametric Theory and Missing Data). A simulation study and an interesting analysis of a recent AIDS trial are provided. 相似文献