首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We investigate methods for regression analysis when covariates are measured with errors. In a subset of the whole cohort, a surrogate variable is available for the true unobserved exposure variable. The surrogate variable satisfies the classical measurement error model, but it may not have repeated measurements. In addition to the surrogate variables that are available among the subjects in the calibration sample, we assume that there is an instrumental variable (IV) that is available for all study subjects. An IV is correlated with the unobserved true exposure variable and hence can be useful in the estimation of the regression coefficients. We propose a robust best linear estimator that uses all the available data, which is the most efficient among a class of consistent estimators. The proposed estimator is shown to be consistent and asymptotically normal under very weak distributional assumptions. For Poisson or linear regression, the proposed estimator is consistent even if the measurement error from the surrogate or IV is heteroscedastic. Finite-sample performance of the proposed estimator is examined and compared with other estimators via intensive simulation studies. The proposed method and other methods are applied to a bladder cancer case-control study.  相似文献   

2.
Biomedical researchers are often interested in estimating the effect of an environmental exposure in relation to a chronic disease endpoint. However, the exposure variable of interest may be measured with errors. In a subset of the whole cohort, a surrogate variable is available for the true unobserved exposure variable. The surrogate variable satisfies an additive measurement error model, but it may not have repeated measurements. The subset in which the surrogate variables are available is called a calibration sample. In addition to the surrogate variables that are available among the subjects in the calibration sample, we consider the situation when there is an instrumental variable available for all study subjects. An instrumental variable is correlated with the unobserved true exposure variable, and hence can be useful in the estimation of the regression coefficients. In this paper, we propose a nonparametric method for Cox regression using the observed data from the whole cohort. The nonparametric estimator is the best linear combination of a nonparametric correction estimator from the calibration sample and the difference of the naive estimators from the calibration sample and the whole cohort. The asymptotic distribution is derived, and the finite sample performance of the proposed estimator is examined via intensive simulation studies. The methods are applied to the Nutritional Biomarkers Study of the Women's Health Initiative.  相似文献   

3.
MOTIVATION: Protein families evolve a multiplicity of functions through gene duplication, speciation and other processes. As a number of studies have shown, standard methods of protein function prediction produce systematic errors on these data. Phylogenomic analysis--combining phylogenetic tree construction, integration of experimental data and differentiation of orthologs and paralogs--has been proposed to address these errors and improve the accuracy of functional classification. The explicit integration of structure prediction and analysis in this framework, which we call structural phylogenomics, provides additional insights into protein superfamily evolution. RESULTS: Results of protein functional classification using phylogenomic analysis show fewer expected false positives overall than when pairwise methods of functional classification are employed. We present an overview of the motivations and fundamental principles of phylogenomic analysis, new methods developed for the key tasks, benchmark datasets for these tasks (when available) and suggest procedures to increase accuracy. We also discuss some of the methods used in the Celera Genomics high-throughput phylogenomic classification of the human genome. AVAILABILITY: Software tools from the Berkeley Phylogenomics Group are available at http://phylogenomics.berkeley.edu  相似文献   

4.
5.
In laboratory research, the rainbow trout has become a counterpart to the white rat, because that fish is an adaptable species available in much of the developed world and stocks from egg through adult are available throughout the year. Moreover, many strains are recognized, and their propagation and laboratory maintenance are not particularly demanding. Also, knowledge of rainbow trout nutrition, husbandry, diseases, immune responses, toxicology, and carcinogenesis exceeds that of any other salmonid or coldwater teleost. The rainbow trout is the logical surrogate species in many studies of other salmonids.  相似文献   

6.
Li E  Wang N  Wang NY 《Biometrics》2007,63(4):1068-1078
Summary .   Joint models are formulated to investigate the association between a primary endpoint and features of multiple longitudinal processes. In particular, the subject-specific random effects in a multivariate linear random-effects model for multiple longitudinal processes are predictors in a generalized linear model for primary endpoints. Li, Zhang, and Davidian (2004, Biometrics 60 , 1–7) proposed an estimation procedure that makes no distributional assumption on the random effects but assumes independent within-subject measurement errors in the longitudinal covariate process. Based on an asymptotic bias analysis, we found that their estimators can be biased when random effects do not fully explain the within-subject correlations among longitudinal covariate measurements. Specifically, the existing procedure is fairly sensitive to the independent measurement error assumption. To overcome this limitation, we propose new estimation procedures that require neither a distributional or covariance structural assumption on covariate random effects nor an independence assumption on within-subject measurement errors. These new procedures are more flexible, readily cover scenarios that have multivariate longitudinal covariate processes, and can be implemented using available software. Through simulations and an analysis of data from a hypertension study, we evaluate and illustrate the numerical performances of the new estimators.  相似文献   

7.
Summary In functional data classification, functional observations are often contaminated by various systematic effects, such as random batch effects caused by device artifacts, or fixed effects caused by sample‐related factors. These effects may lead to classification bias and thus should not be neglected. Another issue of concern is the selection of functions when predictors consist of multiple functions, some of which may be redundant. The above issues arise in a real data application where we use fluorescence spectroscopy to detect cervical precancer. In this article, we propose a Bayesian hierarchical model that takes into account random batch effects and selects effective functions among multiple functional predictors. Fixed effects or predictors in nonfunctional form are also included in the model. The dimension of the functional data is reduced through orthonormal basis expansion or functional principal components. For posterior sampling, we use a hybrid Metropolis–Hastings/Gibbs sampler, which suffers slow mixing. An evolutionary Monte Carlo algorithm is applied to improve the mixing. Simulation and real data application show that the proposed model provides accurate selection of functional predictors as well as good classification.  相似文献   

8.
Taylor JM  Wang Y  Thiébaut R 《Biometrics》2005,61(4):1102-1111
In a randomized clinical trial, a statistic that measures the proportion of treatment effect on the primary clinical outcome that is explained by the treatment effect on a surrogate outcome is a useful concept. We investigate whether a statistic proposed to estimate this proportion can be given a causal interpretation as defined by models of counterfactual variables. For the situation of binary surrogate and outcome variables, two counterfactual models are considered, both of which include the concept of the proportion of the treatment effect, which acts through the surrogate. In general, the statistic does not equal either of the two proportions from the counterfactual models, and can be substantially different. Conditions are given for which the statistic does equal the counterfactual model proportions. A randomized clinical trial with potential surrogate endpoints is undertaken in a scientific context; this context will naturally place constraints on the parameters of the counterfactual model. We conducted a simulation experiment to investigate what impact these constraints had on the relationship between the proportion explained (PE) statistic and the counterfactual model proportions. We found that observable constraints had very little impact on the agreement between the statistic and the counterfactual model proportions, whereas unobservable constraints could lead to more agreement.  相似文献   

9.
Calculating the kinetics of motion using inverse or forward dynamics methods requires the use of accurate body segment inertial parameters. The methods available for calculating these body segment parameters (BSPs) have several limitations and a main concern is the applicability of predictive equations to several different populations. This study examined the differences in BSPs between 4 human populations using dual energy x-ray absorptiometry (DEXA), developed linear regression equations to predict mass, center of mass location (CM) and radius of gyration (K) in the frontal plane on 5 body segments and examined the errors produced by using several BSP sources in the literature. Significant population differences were seen in all segments for all populations and all BSPs except hand mass, indicating that population specific BSP predictors are needed. The linear regression equations developed performed best overall when compared to the other sources, yet no one set of predictors performed best for all segments, populations or BSPs. Large errors were seen with all models which were attributed to large individual differences within groups. Equations which account for these differences, including measurements of limb circumferences and breadths may provide better estimations. Geometric models use these parameters, however the models examined in this study did not perform well, possibly due to the assumption of constant density or the use of an overly simple shape. Creating solids which account for density changes or which mimic the mass distribution characteristics of the segment may solve this problem. Otherwise, regression equations specific for populations according to age, gender, race, and morphology may be required to provide accurate estimations of BSPs for use in kinetic equations of motion.  相似文献   

10.
Prediction of body weight of fossil Artiodactyla   总被引:1,自引:0,他引:1  
Many dimensions of the postcranial skeleton of ruminant artiodactyls scale closely with body weight and are therefore potentially useful as predictors of body weight in fossil species. Using 45 dimensions of the skeleton a series of predictive equations was generated based on the scaling relationships of the family Bovidae. As a test of their usefulness these equations were used to predict body weights of a number of living ruminant artiodactyls, and six genera of fossil artiodactyls. For most species body weight estimates within 25° of actual weight were given by the mean of the predicted weights from all measurements except lengths of long bones. While femur length was a reasonable predictor of body weight, lengths of distal long bones were unreliable and should not be used as indicators of relative or absolute body weights. Some non-length measurements are biased in certain taxonomic groups; the possibility of erroneous estimates from such measurements can be reduced by using as many estimators of body weight as are available. No species of artiodactyl tested is so highly modified in all dimensions that all results were erroneous. Subsets of measurements which might be available from a typical fossil fragment also gave reliable results.  相似文献   

11.
For surface fluxes of carbon dioxide, the net daily flux is the sum of daytime and nighttime fluxes of approximately the same magnitude and opposite direction. The net flux is therefore significantly smaller than the individual flux measurements and error assessment is critical in determining whether a surface is a net source or sink of carbon dioxide. For carbon dioxide flux measurements, it is an occasional misconception that the net flux is measured as the difference between the net upward and downward fluxes (i.e. a small difference between large terms). This is not the case. The net flux is the sum of individual (half-hourly or hourly) flux measurements, each with an associated error term. The question of errors and uncertainties in long-term flux measurements of carbon and water is addressed by first considering the potential for errors in flux measuring systems in general and thus errors which are relevant to a wide range of timescales of measurement. We also focus exclusively on flux measurements made by the micrometeorological method of eddy covariance. Errors can loosely be divided into random errors and systematic errors, although in reality any particular error may be a combination of both types. Systematic errors can be fully systematic errors (errors that apply on all of the daily cycle) or selectively systematic errors (errors that apply to only part of the daily cycle), which have very different effects. Random errors may also be full or selective, but these do not differ substantially in their properties. We describe an error analysis in which these three different types of error are applied to a long-term dataset to discover how errors may propagate through long-term data and which can be used to estimate the range of uncertainty in the reported sink strength of the particular ecosystem studied.  相似文献   

12.
When the observed data are contaminated with errors, the standard two-sample testing approaches that ignore measurement errors may produce misleading results, including a higher type-I error rate than the nominal level. To tackle this inconsistency, a nonparametric test is proposed for testing equality of two distributions when the observed contaminated data follow the classical additive measurement error model. The proposed test takes into account the presence of errors in the observed data, and the test statistic is defined in terms of the (deconvoluted) characteristic functions of the latent variables. Proposed method is applicable to a wide range of scenarios as no parametric restrictions are imposed either on the distribution of the underlying latent variables or on the distribution of the measurement errors. Asymptotic null distribution of the test statistic is derived, which is given by an integral of a squared Gaussian process with a complicated covariance structure. For data-based calibration of the test, a new nonparametric Bootstrap method is developed under the two-sample measurement error framework and its validity is established. Finite sample performance of the proposed test is investigated through simulation studies, and the results show superior performance of the proposed method than the standard tests that exhibit inconsistent behavior. Finally, the proposed method was applied to real data sets from the National Health and Nutrition Examination Survey. An R package MEtest is available through CRAN.  相似文献   

13.
14.
In studies that require long-term and/or costly follow-up of participants to evaluate a treatment, there is often interest in identifying and using a surrogate marker to evaluate the treatment effect. While several statistical methods have been proposed to evaluate potential surrogate markers, available methods generally do not account for or address the potential for a surrogate to vary in utility or strength by patient characteristics. Previous work examining surrogate markers has indicated that there may be such heterogeneity, that is, that a surrogate marker may be useful (with respect to capturing the treatment effect on the primary outcome) for some subgroups, but not for others. This heterogeneity is important to understand, particularly if the surrogate is to be used in a future trial to replace the primary outcome. In this paper, we propose an approach and estimation procedures to measure the surrogate strength as a function of a baseline covariate W and thus examine potential heterogeneity in the utility of the surrogate marker with respect to W. Within a potential outcome framework, we quantify the surrogate strength/utility using the proportion of treatment effect on the primary outcome that is explained by the treatment effect on the surrogate. We propose testing procedures to test for evidence of heterogeneity, examine finite sample performance of these methods via simulation, and illustrate the methods using AIDS clinical trial data.  相似文献   

15.
The analysis of global gene expression data from microarrays is breaking new ground in genetics research, while confronting modelers and statisticians with many critical issues. In this paper, we consider data sets in which a categorical or continuous response is recorded, along with gene expression, on a given number of experimental samples. Data of this type are usually employed to create a prediction mechanism for the response based on gene expression, and to identify a subset of relevant genes. This defines a regression setting characterized by a dramatic under-resolution with respect to the predictors (genes), whose number exceeds by orders of magnitude the number of available observations (samples). We present a dimension reduction strategy that, under appropriate assumptions, allows us to restrict attention to a few linear combinations of the original expression profiles, and thus to overcome under-resolution. These linear combinations can then be used to build and validate a regression model with standard techniques. Moreover, they can be used to rank original predictors, and ultimately to select a subset of them through comparison with a background 'chance scenario' based on a number of independent randomizations. We apply this strategy to publicly available data on leukemia classification.  相似文献   

16.
The existence of uncertainties and variations in data represents a remaining challenge for life cycle assessment (LCA). Moreover, a full analysis may be complex, time‐consuming, and implemented mainly when a product design is already defined. Structured under‐specification, a method developed to streamline LCA, is here proposed to support the residential building design process, by quantifying environmental impact when specific information on the system under analysis cannot be available. By means of structured classifications of materials and building assemblies, it is possible to use surrogate data during the life cycle inventory phase and thus to obtain environmental impact and associated uncertainty. The bill of materials of a building assembly can be specified using minimal detail during the design process. The low‐fidelity characterization of a building assembly and the uncertainty associated with these low levels of fidelity are systematically quantified through structured under‐specification using a structured classification of materials. The analyst is able to use this classification to quantify uncertainty in results at each level of specificity. Concerning building assemblies, an average decrease of uncertainty of 25% is observed at each additional level of specificity within the data structure. This approach was used to compare different exterior wall options during the early design process. Almost 50% of the comparisons can be statistically differentiated at even the lowest level of specificity. This data structure is the foundation of a streamlined approach that can be applied not only when a complete bill of materials is available, but also when fewer details are known.  相似文献   

17.
Abstract: Obtaining reliable results from life-cycle assessment studies is often quite difficult because life-cycle inventory (LCI) data are usually erroneous, incomplete, and even physically meaningless. The real data must satisfy the laws of thermodynamics, so the quality of LCI data may be enhanced by adjusting them to satisfy these laws. This is not a new idea, but a formal thermodynamically sound and statistically rigorous approach for accomplishing this task is not yet available. This article proposes such an approach based on methods for data rectification developed in process systems engineering. This approach exploits redundancy in the available data and models and solves a constrained optimization problem to remove random errors and estimate some missing values. The quality of the results and presence of gross errors are determined by statistical tests on the constraints and measurements. The accuracy of the rectified data is strongly dependent on the accuracy and completeness of the available models, which should capture information such as the life-cycle network, stream compositions, and reactions. Such models are often not provided in LCI databases, so the proposed approach tackles many new challenges that are not encountered in process data rectification. An iterative approach is developed that relies on increasingly detailed information about the life-cycle processes from the user. A comprehensive application of the method to the chlor-alkali inventory being compiled by the National Renewable Energy Laboratory demonstrates the benefits and challenges of this approach.  相似文献   

18.
The validation of surrogate endpoints has been studied by Prentice (1989). He presented a definition as well as a set of criteria, which are equivalent only if the surrogate and true endpoints are binary. Freedman et al. (1992) supplemented these criteria with the so-called 'proportion explained'. Buyse and Molenberghs (1998) proposed replacing the proportion explained by two quantities: (1) the relative effect linking the effect of treatment on both endpoints and (2) an individual-level measure of agreement between both endpoints. The latter quantity carries over when data are available on several randomized trials, while the former can be extended to be a trial-level measure of agreement between the effects of treatment of both endpoints. This approach suggests a new method for the validation of surrogate endpoints, and naturally leads to the prediction of the effect of treatment upon the true endpoint, given its observed effect upon the surrogate endpoint. These ideas are illustrated using data from two sets of multicenter trials: one comparing chemotherapy regimens for patients with advanced ovarian cancer, the other comparing interferon-alpha with placebo for patients with age-related macular degeneration.  相似文献   

19.
Computational procedures for predicting metabolic interventions leading to the overproduction of biochemicals in microbial strains are widely in use. However, these methods rely on surrogate biological objectives (e.g., maximize growth rate or minimize metabolic adjustments) and do not make use of flux measurements often available for the wild-type strain. In this work, we introduce the OptForce procedure that identifies all possible engineering interventions by classifying reactions in the metabolic model depending upon whether their flux values must increase, decrease or become equal to zero to meet a pre-specified overproduction target. We hierarchically apply this classification rule for pairs, triples, quadruples, etc. of reactions. This leads to the identification of a sufficient and non-redundant set of fluxes that must change (i.e., MUST set) to meet a pre-specified overproduction target. Starting with this set we subsequently extract a minimal set of fluxes that must actively be forced through genetic manipulations (i.e., FORCE set) to ensure that all fluxes in the network are consistent with the overproduction objective. We demonstrate our OptForce framework for succinate production in Escherichia coli using the most recent in silico E. coli model, iAF1260. The method not only recapitulates existing engineering strategies but also reveals non-intuitive ones that boost succinate production by performing coordinated changes on pathways distant from the last steps of succinate synthesis.  相似文献   

20.
Surrogate indexes of visceral adiposity, a major risk factor for metabolic and cardiovascular disorders, are routinely used in clinical practice because objective measurements of visceral adiposity are expensive, may involve exposure to radiation, and their availability is limited. We compared several surrogate indexes of visceral adiposity with ultrasound assessment of subcutaneous and visceral adipose tissue depots in 99 young Caucasian adults, including 20 women without androgen excess, 53 women with polycystic ovary syndrome, and 26 men. Obesity was present in 7, 21, and 7 subjects, respectively. We obtained body mass index (BMI), waist circumference (WC), waist-hip ratio (WHR), model of adipose distribution (MOAD), visceral adiposity index (VAI), and ultrasound measurements of subcutaneous and visceral adipose tissue depots and hepatic steatosis. WC and BMI showed the strongest correlations with ultrasound measurements of visceral adiposity. Only WHR correlated with sex hormones. Linear stepwise regression models including VAI were only slightly stronger than models including BMI or WC in explaining the variability in the insulin sensitivity index (yet BMI and WC had higher individual standardized coefficients of regression), and these models were superior to those including WHR and MOAD. WC showed 0.94 (95% confidence interval 0.88–0.99) and BMI showed 0.91 (0.85–0.98) probability of identifying the presence of hepatic steatosis according to receiver operating characteristic curve analysis. In conclusion, WC and BMI not only the simplest to obtain, but are also the most accurate surrogate markers of visceral adiposity in young adults, and are good indicators of insulin resistance and powerful predictors of the presence of hepatic steatosis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号