首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In cancer clinical proteomics, MALDI and SELDI profiling are used to search for biomarkers of potentially curable early-stage disease. A given number of samples must be analysed in order to detect clinically relevant differences between cancers and controls, with adequate statistical power. From clinical proteomic profiling studies, expression data for each peak (protein or peptide) from two or more clinically defined groups of subjects are typically available. Typically, both exposure and confounder information on each subject are also available, and usually the samples are not from randomized subjects. Moreover, the data is usually available in replicate. At the design stage, however, covariates are not typically available and are often ignored in sample size calculations. This leads to the use of insufficient numbers of samples and reduced power when there are imbalances in the numbers of subjects between different phenotypic groups. A method is proposed for accommodating information on covariates, data imbalances and design-characteristics, such as the technical replication and the observational nature of these studies, in sample size calculations. It assumes knowledge of a joint distribution for the protein expression values and the covariates. When discretized covariates are considered, the effect of the covariates enters the calculations as a function of the proportions of subjects with specific attributes. This makes it relatively straightforward (even when pilot data on subject covariates is unavailable) to specify and to adjust for the effect of the expected heterogeneities. The new method suggests certain experimental designs which lead to the use of a smaller number of samples when planning a study. Analysis of data from the proteomic profiling of colorectal cancer reveals that fewer samples are needed when a study is balanced than when it is unbalanced, and when the IMAC30 chip-type is used. The method is implemented in the clippda package and is available in R at: http://www.bioconductor.org/help/bioc-views/release/bioc/html/clippda.html.  相似文献   

2.
Species distribution models often use climate data to assess contemporary and/or future ranges for animal or plant species. Land use and land cover (LULC) data are important predictor variables for determining species range, yet are rarely used when modeling future distributions. In this study, maximum entropy modeling was used to construct species distribution maps for 50 North American bird species to determine relative contributions of climate and LULC for contemporary (2001) and future (2075) time periods. Species presence data were used as a dependent variable, while climate, LULC, and topographic data were used as predictor variables. Results varied by species, but in general, measures of model fit for 2001 indicated significantly poorer fit when either climate or LULC data were excluded from model simulations. Climate covariates provided a higher contribution to 2001 model results than did LULC variables, although both categories of variables strongly contributed. The area deemed to be “suitable” for 2001 species presence was strongly affected by the choice of model covariates, with significantly larger ranges predicted when LULC was excluded as a covariate. Changes in species ranges for 2075 indicate much larger overall range changes due to projected climate change than due to projected LULC change. However, the choice of study area impacted results for both current and projected model applications, with truncation of actual species ranges resulting in lower model fit scores and increased difficulty in interpreting covariate impacts on species range. Results indicate species-specific response to climate and LULC variables; however, both climate and LULC variables clearly are important for modeling both contemporary and potential future species ranges.  相似文献   

3.
Linkage heterogeneity frequently occurs for complex genetic diseases, and statistical methods must account for it to avoid severe loss in power to discover susceptibility genes. A common method to allow for only a fraction of linked pedigrees is to fit a mixture likelihood and then to test for linkage homogeneity, given linkage (admixture test), or to test for linkage while allowing for heterogeneity, using the heterogeneity LOD (HLOD) score. Furthermore, features of the families, such as mean age at diagnosis, may help to discriminate families that demonstrate linkage from those that do not. Pedigree features are often used to create homogeneous subsets, and LOD or HLOD scores are then computed within the subsets. However, this practice introduces several problems, including reduced power (which results from multiple testing and small sample sizes within subsets) and difficulty in interpretation of results. To address some of these limitations, we present a regression-based extension of the mixture likelihood for which pedigree features are used as covariates that determine the probability that a family is the linked type. Some advantages of this approach are that multiple covariates can be used (including quantitative covariates), covariates can be adjusted for each other, and interactions among covariates can be assessed. This new regression method is applied to linkage data for familial prostate cancer and provides new insights into the understanding of prostate cancer linkage heterogeneity.  相似文献   

4.
Modelling survival data from long‐term follow‐up studies presents challenges. The commonly used proportional hazards model should be extended to account for dynamic behaviour of the effects of fixed covariates. This work illustrates the use of reduced rank models in survival data, where some of the covariate effects are allowed to behave dynamically in time and some as fixed. Time‐varying effects of the covariates can be fitted by using interactions of the fixed covariates with flexible transformations of time based on b‐splines. To avoid overfitting, a reduced rank model will restrict the number of parameters, resulting in a more sensible fit to the data. This work presents the basic theory and the algorithm to fit such models. An application to breast cancer data is used for illustration of the suggested methods.  相似文献   

5.
Jiang  Jiancheng; Haibo  Zhou 《Biometrika》2007,94(2):359-369
We consider the additive hazard model when some of the truecovariates are measured only on a randomly selected validationset whereas auxiliary covariates are observed for all studysubjects. An updated pseudoscore estimation approach is proposedfor the parameters of the additive hazard model. It allows oneto fit the model with auxiliary covariates, while leaving thebaseline hazard unspecified. Asymptotic properties of the proposedestimators are established, and consistent standard error estimatorsare developed. Simulations demonstrate that the asymptotic approximationsof the proposed estimates are adequate for practical use. Areal example is used to illustrate the performance of the proposedmethod.  相似文献   

6.
A group of variables are commonly seen in diagnostic medicine when multiple prognostic factors are aggregated into a composite score to represent the risk profile. A model selection method considers these covariates as all-in or all-out types. Model selection procedures for grouped covariates and their applications have thrived in recent years, in part because of the development of genetic research in which gene–gene or gene–environment interactions and regulatory network pathways are considered groups of individual variables. However, little has been discussed on how to utilize grouped covariates to grow a classification tree. In this paper, we propose a nonparametric method to address the selection of split variables for grouped covariates and their following selection of split points. Comprehensive simulations were implemented to show the superiority of our procedures compared to a commonly used recursive partition algorithm. The practical use of our method is demonstrated through a real data analysis that uses a group of prognostic factors to classify the successful mobilization of peripheral blood stem cells.  相似文献   

7.
ABSTRACT: A number of recent works have introduced statistical methods for detecting genetic loci that affect phenotypic variability, which we refer to as variability-controlling quantitative trait loci (vQTL). These are genetic variants whose allelic state predicts how much phenotype values will vary about their expected means. Such loci are of great potential interest in both human and non-human genetic studies, one reason being that a detected vQTL could represent a previously undetected interaction with other genes or environmental factors. The simultaneous publication of these new methods in different journals has in many cases precluded opportunity for comparison. We survey some of these methods, the respective trade-offs they imply, and the connections between them. The methods fall into three main groups: classical non-parametric, fully parametric, and semi-parametric two-stage approximations. Choosing between alternatives involves balancing the need for robustness, flexibility, and speed. For each method, we identify important assumptions and limitations, including those of practical importance, such as their scope for including covariates and random effects. We show in simulations that both parametric methods and their semi-parametric approximations can give elevated false positive rates when they ignore mean-variance relationships intrinsic to the data generation process. We conclude that choice of method depends on the trait distribution, the need to include non-genetic covariates, and the population size and structure, coupled with a critical evaluation of how these fit with the assumptions of the statistical model.  相似文献   

8.
In the linear model with right-censored responses and many potential explanatory variables, regression parameter estimates may be unstable or, when the covariates outnumber the uncensored observations, not estimable. We propose an iterative algorithm for partial least squares, based on the Buckley-James estimating equation, to estimate the covariate effect and predict the response for a future subject with a given set of covariates. We use a leave-two-out cross-validation method for empirically selecting the number of components in the partial least-squares fit that approximately minimizes the error in estimating the covariate effect of a future observation. Simulation studies compare the methods discussed here with other dimension reduction techniques. Data from the AIDS Clinical Trials Group protocol 333 are used to motivate the methodology.  相似文献   

9.
Parzen M  Lipsitz SR 《Biometrics》1999,55(2):580-584
In this paper, a global goodness-of-fit test statistic for a Cox regression model, which has an approximate chi-squared distribution when the model has been correctly specified, is proposed. Our goodness-of-fit statistic is global and has power to detect if interactions or higher order powers of covariates in the model are needed. The proposed statistic is similar to the Hosmer and Lemeshow (1980, Communications in Statistics A10, 1043-1069) goodness-of-fit statistic for binary data as well as Schoenfeld's (1980, Biometrika 67, 145-153) statistic for the Cox model. The methods are illustrated using data from a Mayo Clinic trial in primary billiary cirrhosis of the liver (Fleming and Harrington, 1991, Counting Processes and Survival Analysis), in which the outcome is the time until liver transplantation or death. The are 17 possible covariates. Two Cox proportional hazards models are fit to the data, and the proposed goodness-of-fit statistic is applied to the fitted models.  相似文献   

10.
Burgette LF  Reiter JP 《Biometrics》2012,68(1):92-100
We describe a Bayesian quantile regression model that uses a confirmatory factor structure for part of the design matrix. This model is appropriate when the covariates are indicators of scientifically determined latent factors, and it is these latent factors that analysts seek to include as predictors in the quantile regression. We apply the model to a study of birth weights in which the effects of latent variables representing psychosocial health and actual tobacco usage on the lower quantiles of the response distribution are of interest. The models can be fit using an R package called factorQR.  相似文献   

11.
12.
We provide a new method for estimating the age-specific breeding probabilities from recaptures or resightings of animals marked as young. Our method is more direct than previous methods and allows the modeler to fit and compare models where the age-specific breeding proportions are equal over different cohorts or are a function of external covariates.  相似文献   

13.
"Stochastic survival models which adjust for covariate information have been developed by Beck (1979). These models can include one or two living states and several competing death states. The transitions between stages are assumed irreversible and the transition intensity functions are assumed to be independent of time but dependent upon the covariates." Explicit solutions of the maximum likelihood equations for such models when there are one or two dichotomous covariates are presented. Applications of these models to the case of heart transplants and lung cancer are discussed, and survival in two or four groups is compared. (summary in FRE)  相似文献   

14.
Optimal multivariate matching before randomization   总被引:1,自引:0,他引:1  
Although blocking or pairing before randomization is a basic principle of experimental design, the principle is almost invariably applied to at most one or two blocking variables. Here, we discuss the use of optimal multivariate matching prior to randomization to improve covariate balance for many variables at the same time, presenting an algorithm and a case-study of its performance. The method is useful when all subjects, or large groups of subjects, are randomized at the same time. Optimal matching divides a single group of 2n subjects into n pairs to minimize covariate differences within pairs-the so-called nonbipartite matching problem-then one subject in each pair is picked at random for treatment, the other being assigned to control. Using the baseline covariate data for 132 patients from an actual, unmatched, randomized experiment, we construct 66 pairs matching for 14 covariates. We then create 10000 unmatched and 10000 matched randomized experiments by repeatedly randomizing the 132 patients, and compare the covariate balance with and without matching. By every measure, every one of the 14 covariates was substantially better balanced when randomization was performed within matched pairs. Even after covariance adjustment for chance imbalances in the 14 covariates, matched randomizations provided more accurate estimates than unmatched randomizations, the increase in accuracy being equivalent to, on average, a 7% increase in sample size. In randomization tests of no treatment effect, matched randomizations using the signed rank test had substantially higher power than unmatched randomizations using the rank sum test, even when only 2 of 14 covariates were relevant to a simulated response. Unmatched randomizations experienced rare disasters which were consistently avoided by matched randomizations.  相似文献   

15.
Regression trees allow to search for meaningful explanatory variables that have a non linear impact on the dependent variable. Often they are used when there are many covariates and one does not want to restrict attention to only few of them. To grow a tree at each stage one has to select a cut point for splitting a group into two subgroups. The basis for this are the maxima of the test statistics related to the possible splits due to every covariate. They or the resulting P-values are compared as measure of importance. If covariates have different numbers of missing values, ties, or even different measurement scales the covariates lead to different numbers of tests. Those with a higher number of tests have a greater chance to achieve a smaller P-value if they are not adjusted. This can lead to erroneous splits even if the P-values are looked at informally. There is some theoretical work by Miller and Siegmund (1982) and Lausen and Schumacher (1992) to give an adjustment rule. But the asymptotic is based on a continuum of split points and may not lead to a fair splitting rule when applied to smaller data sets or to covariates with only few different values. Here we develop an approach that allows determination of P-values for any number of splits. The only approximation that is used is the normal approximation of the test statistics. The starting point for this investigation has been a prospective study on the development of AIDS. This is presented here as the main application.  相似文献   

16.
The purpose of these analyses was to determine if incorporating or adjusting for covariates in genetic analyses helped or hindered in genetic analyses, specifically heritability and linkage analyses. To study this question, two types of covariate models were used in the simulated Genetic Analysis Workshop 14 dataset in which the true gene locations are known. All four populations of one replicate were combined for the analyses. The first model included typical covariates of sex and cohort (population) and the second included the typical covariates and also those related endophenotypes that are thought to be associated with the trait (phenotypes A, B, C, D, E, F, G, H, I, J, K, and L). A final best fit model produced in the heritability analyses was used for linkage. Linkage for disease genes D1, D3, and D4 were localized using models with and without the covariates. The use of inclusion of covariates did not appear to have any consistent advantage or disadvantage for the different phenotypes in regards to gene localization or false positive rate.  相似文献   

17.
A general model for the illness-death stochastic process with covariates has been developed for the analysis of survival data. This model incorporates important baseline and time-dependent covariates in order to make an appropriate adjustment for the transition and survival probabilities. The follow-up period is subdivided into small intervals and a constant hazard is assumed for each interval. An approximation formula is derived to estimate the transition parameters when the exact transition time is unknown. The method developed is illustrated with data from a study on the prevention of the recurrence of a myocardial infarction and subsequent mortality, the Beta-Blocker Heart Attack Trial (BHAT). This method provides an analytical approach with which the effectiveness of the treatment can be compared between the placebo and propranolol treatment groups with respect to fatal and nonfatal events simultaneously.  相似文献   

18.
J S Williams 《Biometrics》1978,34(2):209-222
An efficient method is presented for analyses of death rated in one-way or cross-classified experiments where expected survival time for a patient at time of entry on trial is a function of observable covariates. The survival-time distribution used is a Weibull form of Cox's (1972) model. The analysis proceeds in two steps. In the first, goodness of fit of the model is checked, inefficient estimates of the parameters are obtained, and survival times adjusted for the entry covariates are calculated. In the second, efficient estimates and tests for the rate parameters are obtained. These can easily be calculated using hand or desk equipment. Reorganized data sets can be analyzed without repetition of step one, thereby reducing the computational load to hand level and facilitating exploratory data analysis.  相似文献   

19.
20.
Bird migration phenology shows strong responses to climate change. Studies of trends and patterns in phenology are typically based on annual summarizing metrics, such as means and quantiles calculated from raw daily count data. However, with irregularly sampled data and large day‐to‐day variation, such metrics can be biased and noisy, and may be analysed using phenological functions fitted to the data. Here we use count data of migration passage from a Finnish bird observatory to compare different models for the phenological distributions of spring migration (27 species) and autumn migration (57 species). We assess parsimony and goodness‐of‐fit in a set of models, with phenological functions of different complexity, optionally with covariates accounting for day‐to‐day variability. The covariates describe migration intensities of related species or relative migration intensities the previous day (autocovariates). We found that parametric models are often preferred over the more flexible generalized additive models with constrained degrees of freedom. Models corresponding to a mixture of two distinct passing populations were frequently preferred over simpler ones, but usually no more complex models are needed. Slightly more complex models were favoured in spring compared to autumn. Related species’ migration activity effectively improves the model by accounting for the large day‐to‐day variation. Autocovariates were usually not that relevant, implying that autocorrelation is generally not a major concern if phenology is modelled properly. We suggest that parametric models are relatively good for studying single‐population migration phenology, or a mix of two groups with distinct phenologies, especially if daily variation in migration intensity can be controlled for. Generalized additive models may be useful when the migrating population composition is unknown. Despite these guidelines, choosing an appropriate model involves case‐by‐case assessment or the biological relevance and rationale for modelling phenology.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号