首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
The abundant fossils of avian stem taxa unearthed during the last years make it necessary to review and improve the models for estimating body mass used in palaeoecological studies. In this article, single and multiple regression functions based on osteological measurements were obtained from a large data set of extant flying birds for estimating the body mass of 42 Mesozoic specimens from stem taxa Archaeopterygidae, Jeholornithidae, Sapeornithidae, Confuciusornithidae, and Enantiornithes, and basal members of Ornithuromorpha. Traditionally, body mass has been estimated in fossil vertebrates using univariate scaling functions. In contrast, multiple regression functions have been used less frequently. Both predictive methods can be affected by different sources of error from statistics, phylogenetic relationships, ecological adaptations, and bone preservation; however, although some studies have addressed these biases, few have tested them within the context of a single data set. In our data set, we find that the models with greater predictive strength and applicability for new specimens, especially for stem taxa, are those derived from multiple regression analyses. For this reason, we suggest that multiple regression analyses may provide improved predictive strength for stem group specimens. Moreover, the methodology used for selecting variables allowed us to obtain specific sets of predictors for each fossil stem group that presumably minimized the variation resulting from historical contingency (i.e. differences in skeletal morphology arising from phylogeny), locomotor adaptations, and diagenetic compaction. The loss of generalizability in the multiple regression models resulting from collinearity effects was negligible on the body mass estimates derived from our data set. Therefore, the body mass values obtained for Mesozoic specimens are accurate and can be used in future studies in a number of palaeobiological and evolutionary aspects of extinct birds, particularly the first stages of avian flight. © 2015 The Linnean Society of London  相似文献   

2.
The sensitivity and specificity of markers for event times   总被引:1,自引:0,他引:1  
The statistical literature on assessing the accuracy of risk factors or disease markers as diagnostic tests deals almost exclusively with settings where the test, Y, is measured concurrently with disease status D. In practice, however, disease status may vary over time and there is often a time lag between when the marker is measured and the occurrence of disease. One example concerns the Framingham risk score (FR-score) as a marker for the future risk of cardiovascular events, events that occur after the score is ascertained. To evaluate such a marker, one needs to take the time lag into account since the predictive accuracy may be higher when the marker is measured closer to the time of disease occurrence. We therefore consider inference for sensitivity and specificity functions that are defined as functions of time. Semiparametric regression models are proposed. Data from a cohort study are used to estimate model parameters. One issue that arises in practice is that event times may be censored. In this research, we extend in several respects the work by Leisenring et al. (1997) that dealt only with parametric models for binary tests and uncensored data. We propose semiparametric models that accommodate continuous tests and censoring. Asymptotic distribution theory for parameter estimates is developed and procedures for making statistical inference are evaluated with simulation studies. We illustrate our methods with data from the Cardiovascular Health Study, relating the FR-score measured at enrollment to subsequent risk of cardiovascular events.  相似文献   

3.
R Brookmeyer  J G Liao 《Biometrics》1990,46(4):1151-1163
The objective of this paper is to develop statistical methods for estimating current and future numbers of individuals in different stages of the natural history of the human immunodeficiency (AIDS) virus infection and to evaluate the impact of therapeutic advances on these numbers. The approach is to extend the method of back-calculation to allow for a multistage model of natural history and to permit the hazard functions of progression from one stage to the next to depend on calendar time. Quasi-likelihood estimates of key quantities for evaluating health care needs can be obtained through iteratively reweighted least squares under weakly parametric models for the infection rate. An approach is proposed for incorporating into the analysis independent estimates of human immunodeficiency virus (HIV) prevalence obtained from epidemiologic surveys. The methods are applied to the AIDS epidemic in the United States. Short-term projections are given of both AIDS incidence and the numbers of HIV-infected AIDS-free individuals with CD4 cell depletion. The impact of therapeutic advances on these numbers is evaluated using a change-point hazard model. A number of important sources of uncertainty must be considered when interpreting the results, including uncertainties in the specified hazard functions of disease progression, in the parametric model for the infection rate, in the AIDS incidence data, in the efficacy of treatment, and in the proportions of HIV-infected individuals receiving treatment.  相似文献   

4.
Capture-recapture models were developed to estimate survival using data arising from marking and monitoring wild animals over time. Variation in survival may be explained by incorporating relevant covariates. We propose nonparametric and semiparametric regression methods for estimating survival in capture-recapture models. A fully Bayesian approach using Markov chain Monte Carlo simulations was employed to estimate the model parameters. The work is illustrated by a study of Snow petrels, in which survival probabilities are expressed as nonlinear functions of a climate covariate, using data from a 40-year study on marked individuals, nesting at Petrels Island, Terre Adélie.  相似文献   

5.
In the linear model with right-censored responses and many potential explanatory variables, regression parameter estimates may be unstable or, when the covariates outnumber the uncensored observations, not estimable. We propose an iterative algorithm for partial least squares, based on the Buckley-James estimating equation, to estimate the covariate effect and predict the response for a future subject with a given set of covariates. We use a leave-two-out cross-validation method for empirically selecting the number of components in the partial least-squares fit that approximately minimizes the error in estimating the covariate effect of a future observation. Simulation studies compare the methods discussed here with other dimension reduction techniques. Data from the AIDS Clinical Trials Group protocol 333 are used to motivate the methodology.  相似文献   

6.
Extinction risk varies across species and space owing to the combined and interactive effects of ecology/life history and geography. For predictive conservation science to be effective, large datasets and integrative models that quantify the relative importance of potential factors and separate rapidly changing from relatively static threat drivers are urgently required. Here, we integrate and map in space the relative and joint effects of key correlates of The International Union for Conservation of Nature-assessed extinction risk for 8700 living birds. Extinction risk varies significantly with species' broad-scale environmental niche, geographical range size, and life-history and ecological traits such as body size, developmental mode, primary diet and foraging height. Even at this broad scale, simple quantifications of past human encroachment across species' ranges emerge as key in predicting extinction risk, supporting the use of land-cover change projections for estimating future threat in an integrative setting. A final joint model explains much of the interspecific variation in extinction risk and provides a remarkably strong prediction of its observed global geography. Our approach unravels the species-level structure underlying geographical gradients in extinction risk and offers a means of disentangling static from changing components of current and future threat. This reconciliation of intrinsic and extrinsic, and of past and future extinction risk factors may offer a critical step towards a more continuous, forward-looking assessment of species' threat status based on geographically explicit environmental change projections, potentially advancing global predictive conservation science.  相似文献   

7.
Two regression methods are proposed for estimating age in nonhuman primates from deciduous dental eruption data. The first method consists of step-wise multiple regression using dental eruption state (present/absent) of each tooth as independent variables. The second method uses the total number of teeth erupted as an independent variable in an exponential model. We applied both methods to a sample of 175 well nourished infant and juvenile baboons (Papio sp.), housed in an outdoor breeding corral, and ranging in age from birth to 763 days. From this sample, 129 animals were used to compute the regression formulae, and 46 animals were used for cross validation. Both models show good overall fits and high predictive accuracy with the independent cross validation sample.  相似文献   

8.
The positive and negative predictive values are standard ways of quantifying predictive accuracy when both the outcome and the prognostic factor are binary. Methods for comparing the predictive values of two or more binary factors have been discussed previously (Leisenring et al., 2000, Biometrics 56, 345-351). We propose extending the standard definitions of the predictive values to accommodate prognostic factors that are measured on a continuous scale and suggest a corresponding graphical method to summarize predictive accuracy. Drawing on the work of Leisenring et al. we make use of a marginal regression framework and discuss methods for estimating these predictive value functions and their differences within this framework. The methods presented in this paper have the potential to be useful in a number of areas including the design of clinical trials and health policy analysis.  相似文献   

9.
Wei Pan 《Biometrics》2001,57(2):529-534
Model selection is a necessary step in many practical regression analyses. But for methods based on estimating equations, such as the quasi-likelihood and generalized estimating equation (GEE) approaches, there seem to be few well-studied model selection techniques. In this article, we propose a new model selection criterion that minimizes the expected predictive bias (EPB) of estimating equations. A bootstrap smoothed cross-validation (BCV) estimate of EPB is presented and its performance is assessed via simulation for overdispersed generalized linear models. For illustration, the method is applied to a real data set taken from a study of the development of ewe embryos.  相似文献   

10.
We investigate whether relative contributions of genetic and shared environmental factors are associated with an increased risk in melanoma. Data from the Queensland Familial Melanoma Project comprising 15,907 subjects arising from 1912 families were analyzed to estimate the additive genetic, common and unique environmental contributions to variation in the age at onset of melanoma. Two complementary approaches for analyzing correlated time-to-onset family data were considered: the generalized estimating equations (GEE) method in which one can estimate relationship-specific dependence simultaneously with regression coefficients that describe the average population response to changing covariates; and a subject-specific Bayesian mixed model in which heterogeneity in regression parameters is explicitly modeled and the different components of variation may be estimated directly. The proportional hazards and Weibull models were utilized, as both produce natural frameworks for estimating relative risks while adjusting for simultaneous effects of other covariates. A simple Markov Chain Monte Carlo method for covariate imputation of missing data was used and the actual implementation of the Bayesian model was based on Gibbs sampling using the free ware package BUGS. In addition, we also used a Bayesian model to investigate the relative contribution of genetic and environmental effects on the expression of naevi and freckles, which are known risk factors for melanoma.  相似文献   

11.
An important task of human genetics studies is to predict accurately disease risks in individuals based on genetic markers, which allows for identifying individuals at high disease risks, and facilitating their disease treatment and prevention. Although hundreds of genome-wide association studies (GWAS) have been conducted on many complex human traits in recent years, there has been only limited success in translating these GWAS data into clinically useful risk prediction models. The predictive capability of GWAS data is largely bottlenecked by the available training sample size due to the presence of numerous variants carrying only small to modest effects. Recent studies have shown that different human traits may share common genetic bases. Therefore, an attractive strategy to increase the training sample size and hence improve the prediction accuracy is to integrate data from genetically correlated phenotypes. Yet, the utility of genetic correlation in risk prediction has not been explored in the literature. In this paper, we analyzed GWAS data for bipolar and related disorders and schizophrenia with a bivariate ridge regression method, and found that jointly predicting the two phenotypes could substantially increase prediction accuracy as measured by the area under the receiver operating characteristic curve. We also found similar prediction accuracy improvements when we jointly analyzed GWAS data for Crohn’s disease and ulcerative colitis. The empirical observations were substantiated through our comprehensive simulation studies, suggesting that a gain in prediction accuracy can be obtained by combining phenotypes with relatively high genetic correlations. Through both real data and simulation studies, we demonstrated pleiotropy can be leveraged as a valuable asset that opens up a new opportunity to improve genetic risk prediction in the future.  相似文献   

12.
Practical application of genomic-based risk stratification to clinical diagnosis is appealing yet performance varies widely depending on the disease and genomic risk score (GRS) method. Celiac disease (CD), a common immune-mediated illness, is strongly genetically determined and requires specific HLA haplotypes. HLA testing can exclude diagnosis but has low specificity, providing little information suitable for clinical risk stratification. Using six European cohorts, we provide a proof-of-concept that statistical learning approaches which simultaneously model all SNPs can generate robust and highly accurate predictive models of CD based on genome-wide SNP profiles. The high predictive capacity replicated both in cross-validation within each cohort (AUC of 0.87–0.89) and in independent replication across cohorts (AUC of 0.86–0.9), despite differences in ethnicity. The models explained 30–35% of disease variance and up to ∼43% of heritability. The GRS''s utility was assessed in different clinically relevant settings. Comparable to HLA typing, the GRS can be used to identify individuals without CD with ≥99.6% negative predictive value however, unlike HLA typing, fine-scale stratification of individuals into categories of higher-risk for CD can identify those that would benefit from more invasive and costly definitive testing. The GRS is flexible and its performance can be adapted to the clinical situation by adjusting the threshold cut-off. Despite explaining a minority of disease heritability, our findings indicate a genomic risk score provides clinically relevant information to improve upon current diagnostic pathways for CD and support further studies evaluating the clinical utility of this approach in CD and other complex diseases.  相似文献   

13.
Clinical guidelines recommend that violence risk be assessed in schizophrenia. Current approaches are resource-intensive as they employ detailed clinical assessments of dangerousness for most patients. An alternative approach would be to first screen out patients at very low risk of future violence prior to more costly and time-consuming assessments. In order to implement such a stepped strategy, we developed a simple tool to screen out individuals with schizophrenia at very low risk of violent offending. We merged high quality Swedish national registers containing information on psychiatric diagnoses, socio-demographic factors, and violent crime. A cohort of 13,806 individuals with hospital discharge diagnoses of schizophrenia was identified and followed for up to 33 years for violent crime. Cox regression was used to determine risk factors for violent crime and construct the screening tool, the predictive validity of which was measured using four outcome statistics. The instrument was calibrated on 6,903 participants and cross-validated using three independent replication samples of 2,301 participants each. Regression analyses resulted in a tool composed of five items: male sex, previous criminal conviction, young age at assessment, comorbid alcohol abuse, and comorbid drug abuse. At 5 years after discharge, the instrument had a negative predictive value of 0.99 (95% CI = 0.98–0.99), meaning that very few individuals who the tool screened out (n = 2,359 out of original sample of 6,903) were subsequently convicted of a violent offence. Screening out patients who are at very low risk of violence prior to more detailed clinical assessment may assist the risk assessment process in schizophrenia.  相似文献   

14.
Many recent studies of extinction risk have attempted to determine what differences exist between threatened and non-threatened species. One potential problem in such studies is that species-level data may contain phylogenetic non-independence. However, the use of phylogenetic comparative methods (PCM) to account for non-independence remains controversial, and some recent studies of extinction have recommended other methods that do not account for phylogenetic non-independence, notably decision trees (DTs). Here we perform a systematic comparison of techniques, comparing the performance of PCM regression models with corresponding non-phylogenetic regressions and DTs over different clades and response variables. We found that predictions were broadly consistent among techniques, but that predictive precision varied across techniques with PCM regression and DTs performing best. Additionally, despite their inability to account for phylogenetic non-independence, DTs were useful in highlighting interaction terms for inclusion in the PCM regression models. We discuss the implications of these findings for future comparative studies of extinction risk.  相似文献   

15.
The minimum-convex-polygon method for estimating home-range area, in which the outermost points are connected in a particular way, is extremely sensitive to sample size. Existing methods for estimating home-range area that correct for sample size fail to encompass all the important kinds of biological variation in the home-range utilization. (The home-range utilization describes the relative degree to which different units of space are frequented by an animal.) Although previous methods have assumed specific unimodal distributions, such as the bivariate normal, home-range utilizations may resemble funnels or pies as well as hills. A regression method is introduced that uses data from well-sampled individuals whose true home ranges are assumed approximately known to predict home-range areas for less well-sampled individuals. Appendix 5 summarizes this method. Sizes of home ranges estimated by the regression method are half or less than sizes estimated by previous methods in which utilization distributions are assumed to be all of a particular statistical type.  相似文献   

16.
BACKGROUND AND AIMS: Most current thermal-germination models are parameterized with subpopulation-specific rate data, interpolated from cumulative-germination-response curves. The purpose of this study was to evaluate the relative accuracy of three-dimensional models for predicting cumulative germination response to temperature. Three-dimensional models are relatively more efficient to implement than two-dimensional models and can be parameterized directly with measured data. METHODS: Seeds of four rangeland grass species were germinated over the constant-temperature range of 3 to 38 degrees C and monitored for subpopulation variability in germination-rate response. Models for estimating subpopulation germination rate were generated as a function of temperature using three-dimensional regression, statistical gridding and iterative-probit optimization using both measured and interpolated-subpopulation data as model inputs. KEY RESULTS: Statistical gridding is more accurate than three-dimensional regression and iterative-probit optimization for modelling germination rate and germination time as a function of temperature and subpopulation. Optimization of the iterative-probit model lowers base-temperature estimates, relative to two-dimensional cardinal-temperature models, and results in an inability to resolve optimal-temperature coefficients as a function of subpopulation. Residual model error for the three-dimensional model was extremely high when parameterized with measured-subpopulation data. Use of measured data for model evaluation provided a more realistic estimate of predictive error than did evaluation of the larger set of interpolated-subpopulation data. CONCLUSIONS: Statistical-gridding techniques may provide a relatively efficient method for estimating germination response in situations where the primary objective is to estimate germination time. This methodology allows for direct use of germination data for model parameterization and automates the significant computational requirements of a two-dimensional piece-wise-linear model, previously shown to produce the most accurate estimates of germination time.  相似文献   

17.
Hongwei Zhao  Lili Tian 《Biometrics》2001,57(4):1002-1008
Medical cost estimation is very important to health care organizations and health policy makers. We consider cost-effectiveness analysis for competing treatments in a staggered-entry, survival-analysis-based clinical trial. We propose a method for estimating mean medical cost over patients in such settings. The proposed estimator is shown to be consistent and asymptotically normal, and its asymptotic variance can be obtained. In addition, we propose a method for estimating the incremental cost-effectiveness ratio and for obtaining a confidence interval for it. Simulation experiments are conducted to evaluate our proposed methods. Finally, we apply our methods to a clinical trial comparing the cost effectiveness of implanted cardiac defibrillators with conventional therapy for individuals at high risk for ventricular arrhythmias.  相似文献   

18.
Advances in experimental design and equipment have simplified the collection of maximum metabolic rate (MMR) data for a more diverse array of water‐breathing animals. However, little attention has been given to the consequences of analytical choices in the estimation of MMR. Using different analytical methods can reduce the comparability of MMR estimates across species and studies and has consequences for the burgeoning number of macroecological meta‐analyses using metabolic rate data. Two key analytical choices that require standardization are the time interval, or regression window width, over which MMR is estimated, and the method used to locate that regression window within the raw oxygen depletion trace. Here, we consider the effect of both choices by estimating MMR for two shark and two salmonid species of different activity levels using multiple regression window widths and three analytical methods: rolling regression, sequential regression, and segmented regression. Shorter regression windows yielded higher metabolic rate estimates, with a risk that the shortest windows (<1‐min) reflect more system noise than MMR signal. Rolling regression was the best candidate model and produced the highest MMR estimates. Sequential regression models consistently produced lower relative estimates than rolling regression models, while the segmented regression model was unable to produce consistent MMR estimates across individuals. The time‐point of the MMR regression window along the oxygen consumption trace varied considerably across individuals but not across models. We show that choice of analytical method, in addition to more widely understood experimental choices, profoundly affect the resultant estimates of MMR. We recommend that researchers (1) employ a rolling regression model with a reliable regression window tailored to their experimental system and (2) explicitly report their analytical methods, including publishing raw data and code.  相似文献   

19.
Estimates of absolute cause-specific risk in cohort studies   总被引:2,自引:0,他引:2  
J Benichou  M H Gail 《Biometrics》1990,46(3):813-826
In this paper we study methods for estimating the absolute risk of an event c1 in a time interval [t1, t2], given that the individual is at risk at t1 and given the presence of competing risks. We discuss some advantages of absolute risk for measuring the prognosis of an individual patient and some difficulties of interpretation for comparing two treatment groups. We also discuss the importance of the concept of absolute risk in evaluating public health measures to prevent disease. Variance calculations permit one to gauge the relative importance of random and systematic errors in estimating absolute risk. Efficiency calculations were also performed to determine how much precision is lost in estimating absolute risk with a nonparametric approach or with a flexible piecewise exponential model rather than a simple exponential model, and other calculations indicate the extent of bias that arises with the simple exponential model when that model is invalid. Such calculations suggest that the more flexible models will be useful in practice. Simulations confirm that asymptotic methods yield reliable variance estimates and confidence interval coverages in samples of practical size.  相似文献   

20.

Background

Genomic selection (GS) is a recent selective breeding method which uses predictive models based on whole-genome molecular markers. Until now, existing studies formulated GS as the problem of modeling an individual’s breeding value for a particular trait of interest, i.e., as a regression problem. To assess predictive accuracy of the model, the Pearson correlation between observed and predicted trait values was used.

Contributions

In this paper, we propose to formulate GS as the problem of ranking individuals according to their breeding value. Our proposed framework allows us to employ machine learning methods for ranking which had previously not been considered in the GS literature. To assess ranking accuracy of a model, we introduce a new measure originating from the information retrieval literature called normalized discounted cumulative gain (NDCG). NDCG rewards more strongly models which assign a high rank to individuals with high breeding value. Therefore, NDCG reflects a prerequisite objective in selective breeding: accurate selection of individuals with high breeding value.

Results

We conducted a comparison of 10 existing regression methods and 3 new ranking methods on 6 datasets, consisting of 4 plant species and 25 traits. Our experimental results suggest that tree-based ensemble methods including McRank, Random Forests and Gradient Boosting Regression Trees achieve excellent ranking accuracy. RKHS regression and RankSVM also achieve good accuracy when used with an RBF kernel. Traditional regression methods such as Bayesian lasso, wBSR and BayesC were found less suitable for ranking. Pearson correlation was found to correlate poorly with NDCG. Our study suggests two important messages. First, ranking methods are a promising research direction in GS. Second, NDCG can be a useful evaluation measure for GS.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号