首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In a randomized two-group parallel trial the mean causal effect is typically estimated as the difference in means or proportions for patients receiving, say, either treatment (T) or control (C). Treatment effect heterogeneity (TEH), or unit-treatment interaction, the variability of the causal effect (defined in terms of potential outcomes) across individuals, is often ignored. Since only one of the outcomes, either Y(T) or Y(C), is observed for each unit in such studies, the TEH is not directly estimable. For convenience, it is often assumed to be minimal or zero. We are particularly interested in the 'treatment risk' for binary outcomes, that is, the proportion of individuals who would succeed on C but fail on T. Previous work has shown that the treatment risk can be bounded (Albert, Gadbury and Mascha, 2005), and that the confidence interval width around it can be narrowed using clustered or correlated data (Mascha and Albert, 2006). Without further parameter constraints, treatment risk is unidentifiable. We show, however, that the treatment risk can be directly estimated when the four underlying population counts comprising the joint distribution of the potential outcomes, Y(T) and Y(C), follow constraints consistent with the Dirichlet multinomial. We propose a test of zero treatment risk and show it to have good size and power. Methods are applied to both a randomized as well as a non-randomized study. Implications for medical decision-making at the policy and individual levels are discussed.  相似文献   

2.
Issues of post-randomization selection bias and truncation-by-death can arise in randomized clinical trials; for example, in a cancer prevention trial, an outcome such as cancer severity is undefined for individuals who do not develop cancer. Restricting analysis to a subpopulation selected after randomization can give rise to biased outcome comparisons. One approach to deal with such issues is to consider the principal strata effect (PSE, or equally, the survivor average causal effect). PSE is defined as the effect of treatment on the outcome among the subpopulation that would have been selected under either treatment arm. Unfortunately, the PSE cannot generally be estimated without the identifying assumptions; however, the bounds can be derived using a deterministic causal model. In this paper, we propose a number of assumptions for deriving the bounds with narrow width. The assumptions and bounds, which differ from those introduced by Zhang and Rubin (2003), are illustrated using data from a randomized prostate cancer prevention trial.  相似文献   

3.
Targeted maximum likelihood estimation of a parameter of a data generating distribution, known to be an element of a semi-parametric model, involves constructing a parametric model through an initial density estimator with parameter ? representing an amount of fluctuation of the initial density estimator, where the score of this fluctuation model at ? = 0 equals the efficient influence curve/canonical gradient. The latter constraint can be satisfied by many parametric fluctuation models since it represents only a local constraint of its behavior at zero fluctuation. However, it is very important that the fluctuations stay within the semi-parametric model for the observed data distribution, even if the parameter can be defined on fluctuations that fall outside the assumed observed data model. In particular, in the context of sparse data, by which we mean situations where the Fisher information is low, a violation of this property can heavily affect the performance of the estimator. This paper presents a fluctuation approach that guarantees the fluctuated density estimator remains inside the bounds of the data model. We demonstrate this in the context of estimation of a causal effect of a binary treatment on a continuous outcome that is bounded. It results in a targeted maximum likelihood estimator that inherently respects known bounds, and consequently is more robust in sparse data situations than the targeted MLE using a naive fluctuation model. When an estimation procedure incorporates weights, observations having large weights relative to the rest heavily influence the point estimate and inflate the variance. Truncating these weights is a common approach to reducing the variance, but it can also introduce bias into the estimate. We present an alternative targeted maximum likelihood estimation (TMLE) approach that dampens the effect of these heavily weighted observations. As a substitution estimator, TMLE respects the global constraints of the observed data model. For example, when outcomes are binary, a fluctuation of an initial density estimate on the logit scale constrains predicted probabilities to be between 0 and 1. This inherent enforcement of bounds has been extended to continuous outcomes. Simulation study results indicate that this approach is on a par with, and many times superior to, fluctuating on the linear scale, and in particular is more robust when there is sparsity in the data.  相似文献   

4.
Cai Z  Kuroki M  Pearl J  Tian J 《Biometrics》2008,64(3):695-701
Summary .   This article considers the problem of estimating the average controlled direct effect (ACDE) of a treatment on an outcome, in the presence of unmeasured confounders between an intermediate variable and the outcome. Such confounders render the direct effect unidentifiable even in cases where the total effect is unconfounded (hence identifiable). Kaufman et al. (2005, Statistics in Medicine 24, 1683–1702) applied a linear programming software to find the minimum and maximum possible values of the ACDE for specific numerical data. In this article, we apply the symbolic Balke–Pearl (1997, Journal of the American Statistical Association 92, 1171–1176) linear programming method to derive closed-form formulas for the upper and lower bounds on the ACDE under various assumptions of monotonicity. These universal bounds enable clinical experimenters to assess the direct effect of treatment from observed data with minimum computational effort, and they further shed light on the sign of the direct effect and the accuracy of the assessments.  相似文献   

5.

Background

Long-term benefits in animal breeding programs require that increases in genetic merit be balanced with the need to maintain diversity (lost due to inbreeding). This can be achieved by using optimal contribution selection. The availability of high-density DNA marker information enables the incorporation of genomic data into optimal contribution selection but this raises the question about how this information affects the balance between genetic merit and diversity.

Methods

The effect of using genomic information in optimal contribution selection was examined based on simulated and real data on dairy bulls. We compared the genetic merit of selected animals at various levels of co-ancestry restrictions when using estimated breeding values based on parent average, genomic or progeny test information. Furthermore, we estimated the proportion of variation in estimated breeding values that is due to within-family differences.

Results

Optimal selection on genomic estimated breeding values increased genetic gain. Genetic merit was further increased using genomic rather than pedigree-based measures of co-ancestry under an inbreeding restriction policy. Using genomic instead of pedigree relationships to restrict inbreeding had a significant effect only when the population consisted of many large full-sib families; with a half-sib family structure, no difference was observed. In real data from dairy bulls, optimal contribution selection based on genomic estimated breeding values allowed for additional improvements in genetic merit at low to moderate inbreeding levels. Genomic estimated breeding values were more accurate and showed more within-family variation than parent average breeding values; for genomic estimated breeding values, 30 to 40% of the variation was due to within-family differences. Finally, there was no difference between constraining inbreeding via pedigree or genomic relationships in the real data.

Conclusions

The use of genomic estimated breeding values increased genetic gain in optimal contribution selection. Genomic estimated breeding values were more accurate and showed more within-family variation, which led to higher genetic gains for the same restriction on inbreeding. Using genomic relationships to restrict inbreeding provided no additional gain, except in the case of very large full-sib families.  相似文献   

6.
Monitoring annual change and long-term trends in population structure and abundance of white-tailed deer (Odocoileus virginianus) is an important but challenging component of their management. Many monitoring programs consist of count-based indices of relative abundance along with a variety of population structure information. Analyzed separately these data can be difficult to interpret because of observation error in the data collection process, missing data, and the lack of an explicit biological model to connect the data streams while accounting for their relative imprecision. We used a Bayesian age-structured integrated population model to integrate data from a fall spotlight survey that produced a count-based index of relative abundance and a volunteer staff and citizen classification survey that generated a fall recruitment index. Both surveys took place from 2003–2018 in the parkland ecoregion of southeast Saskatchewan, Canada. Our approach modeled demographic processes for age-specific (0.5-, 1.5-, ≥2.5-year-old classes) populations and was fit to count and recruitment data via models that allowed for error in the respective observation processes. The Bayesian framework accommodated missing data and allowed aggregation of transects to act as samples from the larger management unit population. The approach provides managers with continuous time series of estimated relative abundance, recruitment rates, and apparent survival rates with full propagation of uncertainty and sharing of information among transects. We used this model to demonstrate winter severity effects on recruitment rates via an interaction between winter snow depth and minimum temperatures. In years with colder than average temperatures and above average snow depth, recruitment was depressed, whereas the negative effect of snow depth reversed in years with above average temperatures. This and other covariate information can be incorporated into the model to test relationships and provide predictions of future population change prior to setting of hunting seasons. Likewise, post hoc analysis of model output allows other hypothesis tests, such as determining the statistical support for whether population status has crossed a management trigger threshold. © 2020 The Wildlife Society.  相似文献   

7.
We discuss Bayesian log-linear models for incomplete contingency tables with both missing and interval censored cells, with the aim of obtaining reliable population size estimates. We also discuss use of external information on the censoring probability, which may substantially reduce uncertainty. We show in simulation that information on lower bounds and external information can each improve the mean squared error of population size estimates, even when the external information is not completely accurate. We conclude with an original example on estimation of prevalence of multiple sclerosis in the metropolitan area of Rome, where five out of six lists have interval censored counts. External information comes from mortality rates of multiple sclerosis patients.  相似文献   

8.
Six different sampling methods to estimate the density of the cassava green mite, Mononychellus tanajoa, are categorized according to whether leaves or leaflets are used as secondary sampling units and whether the number of leaves on the sampled plants are enumerated, estimated from an independent plant sample, or not censused at all. In the last case, sampling can provide information only on the average number of mites per leaf and its variance, while information on stratum sizes is necessary to estimate the mean number of mites per plant as well. It is shown that leaflet-sampling is as reliable as leaf-sampling for the same number of sampling units. When stratum sizes are estimated from a separate plant sample, sampling time may also be reduced, but the estimated mean density and its variance may be biased if mite density and plant size are correlated. Sampling data show that the within-plant variance contributes relatively little to the overall variance of the population density estimates. It points at a sampling strategy in which the number of primary units (plants) is as large as possible at the expense of secondary units (leaflets) per plant. Mean-variance relationships may be applied to estimate sample variances and can be used even when only one leaflet is taken per plant per stratum. An unequal allocation of primary units among strata can increase precision, but the gain is small compared with an equal allocation. Leaf area can be predicted from the length of the longest leaflet and the number of leaflets.  相似文献   

9.
The aim of this study was to evaluate observed and future inbreeding level in Polish Holstein-Friesian cattle population. In total, over 9.8 mln animals were used in the analysis coming from the pedigree of Polish Federation of Cattle Breeders and Dairy Farmers. Inbreeding level, as an average per birth year, was estimated with the method accounting for missing parent information with the assumption of year 1950 as the base year of the population. If an animal had no ancestral records, an average inbreeding level from its birth year was assigned. Twice the average inbreeding level served as relatedness of the animal to the population, which enabled estimation of inbreeding in its offspring. The future inbreeding of potential offspring was estimated as an average of animals (bulls and cows) available for mating in a certain year. It was observed that 30–50% of animals born between 1985 and 2015 had no relevant ancestral information, which is caused by a high number of new animals and/or entire farms entering the national milk recordings. For the year 2015, the observed inbreeding level was 3.30%, which was more than twice the inbreeding with the classical approach (without missing parent information) and higher by 0.4% than the future inbreeding. The average increase of inbreeding in years 2010–2015 was 0.10%, which is similar to other countries monitored by World Holstein-Friesian Federation. However, the values might be underestimated due to low pedigree completeness. The estimates of future inbreeding suggested that observed inbreeding could be even lower and also increase slower, which indicates a constant need to monitor rate of increase in inbreeding over time. The most important aspect of presented results is the necessity to advise individual farmers to keep precise recordings of the matings on their farm in order to improve the pedigree completeness of Polish Holstein-Friesian and to use suitable mating programs to avoid too rapid growth of inbreeding.  相似文献   

10.
Preferential interaction measurements between proteins and monosodium glutamate were carried out to arrive at an understanding of the mechanism of its strong effect on tubulin stability and self-assembly into microtubules. For all proteins studied, i.e. bovine serum albumin, lysozyme, beta-lactoglobulin, and calf brain tubulin, the protein showed a large preferential hydration in the presence of monosodium glutamate. The enhancement of tubulin self-association by monosodium glutamate can be interpreted in terms of the large unfavorable free energy of interaction between the additive and the protein. Preferential interactions were also examined for lysine hydrochloride, which also gave a preferential hydration of the proteins, except for tubulin. The dependence of the preferential hydration parameter on proteins was different for the two additives, suggesting the importance of net electrostatic charges of proteins in their interaction with glutamate anions and lysinium cations. The zero preferential interaction of lysine hydrochloride with tubulin indicates an affinity of the lysine cation for the protein. Both additives increased the transition temperature of proteins. This can be understood in terms of the unfavorable free energy of interaction between the additive and the protein surface, which should be even more unfavorable when the denaturation causes an increase in the surface area.  相似文献   

11.
The objectives of the present study were: (1) to evaluate the importance of genotype×production environment interaction for the genetic evaluation of birth weight (BW) and weaning weight (WW) in a population of composite beef cattle in Brazil, and (2) to investigate the importance of sire×contemporary group interaction (S×CG) to model G×E and improve the accuracy of prediction in routine genetic evaluations of this population. Analyses were performed with one, two (favorable and unfavorable) or three (favorable, intermediate, unfavorable) different definitions of production environments. Thus, BW and WW records of animals in a favorable environment were assigned to either trait 1, in an intermediate environment to trait 2 or in an unfavorable environment to trait 3. The (co)variance components were estimated using Gibbs sampling in single-, bi- or three-trait animal models according to the definition of number of production environments. In general, the estimates of genetic parameters for BW and WW were similar between environments. The additive genetic correlations between production environments were close to unity for BW; however, when examining the highest posterior density intervals, the correlation between favorable and unfavorable environments reached a value of only 0.70, a fact that may lead to changes in the ranking of sires across environments. The posterior mean genetic correlation between direct effects was 0.63 in favorable and unfavorable environments for WW. When S×CG was included in two- or three-trait analyses, all direct genetic correlations were close to unity, suggesting that there was no evidence of a genotype×production environment interaction. Furthermore, the model including S×CG contributed to prevent overestimation of the accuracy of breeding values of sires, provided a lower error of prediction for both direct and maternal breeding values, lower squared bias, residual variance and deviance information criterion than the model omitting S×CG. Thus, the model that included S×CG can therefore be considered the best model on the basis of these criteria. The genotype×production environment interaction should not be neglected in the genetic evaluation of BW and WW in the present population of beef cattle. The inclusion of S×CG in the model is a feasible and plausible alternative to model the effects of G×E in the genetic evaluations.  相似文献   

12.

Background

This article describes classical and Bayesian interval estimation of genetic susceptibility based on random samples with pre-specified numbers of unrelated cases and controls.

Results

Frequencies of genotypes in cases and controls can be estimated directly from retrospective case-control data. On the other hand, genetic susceptibility defined as the expected proportion of cases among individuals with a particular genotype depends on the population proportion of cases (prevalence). Given this design, prevalence is an external parameter and hence the susceptibility cannot be estimated based on only the observed data. Interval estimation of susceptibility that can incorporate uncertainty in prevalence values is explored from both classical and Bayesian perspective. Similarity between classical and Bayesian interval estimates in terms of frequentist coverage probabilities for this problem allows an appealing interpretation of classical intervals as bounds for genetic susceptibility. In addition, it is observed that both the asymptotic classical and Bayesian interval estimates have comparable average length. These interval estimates serve as a very good approximation to the "exact" (finite sample) Bayesian interval estimates. Extension from genotypic to allelic susceptibility intervals shows dependency on phenotype-induced deviations from Hardy-Weinberg equilibrium.

Conclusions

The suggested classical and Bayesian interval estimates appear to perform reasonably well. Generally, the use of exact Bayesian interval estimation method is recommended for genetic susceptibility, however the asymptotic classical and approximate Bayesian methods are adequate for sample sizes of at least 50 cases and controls.  相似文献   

13.

Background

Cause-of-death data for many developing countries are not available. Information on deaths in hospital by cause is available in many low- and middle-income countries but is not a representative sample of deaths in the population. We propose a method to estimate population cause-specific mortality fractions (CSMFs) using data already collected in many middle-income and some low-income developing nations, yet rarely used: in-hospital death records.

Methods and Findings

For a given cause of death, a community''s hospital deaths are equal to total community deaths multiplied by the proportion of deaths occurring in hospital. If we can estimate the proportion dying in hospital, we can estimate the proportion dying in the population using deaths in hospital. We propose to estimate the proportion of deaths for an age, sex, and cause group that die in hospital from the subset of the population where vital registration systems function or from another population. We evaluated our method using nearly complete vital registration (VR) data from Mexico 1998–2005, which records whether a death occurred in a hospital. In this validation test, we used 45 disease categories. We validated our method in two ways: nationally and between communities. First, we investigated how the method''s accuracy changes as we decrease the amount of Mexican VR used to estimate the proportion of each age, sex, and cause group dying in hospital. Decreasing VR data used for this first step from 100% to 9% produces only a 12% maximum relative error between estimated and true CSMFs. Even if Mexico collected full VR information only in its capital city with 9% of its population, our estimation method would produce an average relative error in CSMFs across the 45 causes of just over 10%. Second, we used VR data for the capital zone (Distrito Federal and Estado de Mexico) and estimated CSMFs for the three lowest-development states. Our estimation method gave an average relative error of 20%, 23%, and 31% for Guerrero, Chiapas, and Oaxaca, respectively.

Conclusions

Where accurate International Classification of Diseases (ICD)-coded cause-of-death data are available for deaths in hospital and for VR covering a subset of the population, we demonstrated that population CSMFs can be estimated with low average error. In addition, we showed in the case of Mexico that this method can substantially reduce error from biased hospital data, even when applied to areas with widely different levels of development. For countries with ICD-coded deaths in hospital, this method potentially allows the use of existing data to inform health policy.  相似文献   

14.
In a clinical trial, statistical reports are typically concerned about the mean difference in two groups. Now there is increasing interest in the heterogeneity of the treatment effect, which has important implications in treatment evaluation and selection. The treatment harm rate (THR), which is defined by the proportion of people who has a worse outcome on the treatment compared to the control, was used to characterize the heterogeneity. Since THR involves the joint distribution of the two potential outcomes, it cannot be identified without further assumptions even in the randomized trials. We can only derive the simple bounds with the observed data. But the simple bounds are usually too wide. In this paper, we use a secondary outcome that satisfies the monotonicity assumption to tighten the bounds. It is shown that the bounds we derive cannot be wider than the simple bounds. We also construct some simulation studies to assess the performance of our bounds in finite sample. The results show that a secondary outcome, which is more closely related to the primary outcome, can lead to narrower bounds. Finally, we illustrate the application of the proposed bounds in a randomized clinical trial of determining whether the intensive glycemia could reduce the risk of development or progression of diabetic retinopathy.  相似文献   

15.
In randomized trials with noncompliance, causal effects cannot be identified without strong assumptions. Therefore, several authors have considered bounds on the causal effects. Applying an idea of VanderWeele ( 2008 ), Chiba ( 2009 ) gave bounds on the average causal effects in randomized trials with noncompliance using the information on the randomized assignment, the treatment received and the outcome under monotonicity assumptions about covariates. But he did not consider any observed covariates. If there are some observed covariates such as age, gender, and race in a trial, we propose new bounds using the observed covariate information under some monotonicity assumptions similar to those of VanderWeele and Chiba. And we compare the three bounds in a real example.  相似文献   

16.
Li Y  Taylor JM  Little RJ 《Biometrics》2011,67(4):1434-1441
In clinical trials, a biomarker (S ) that is measured after randomization and is strongly associated with the true endpoint (T) can often provide information about T and hence the effect of a treatment (Z ) on T. A useful biomarker can be measured earlier than T and cost less than T. In this article, we consider the use of S as an auxiliary variable and examine the information recovery from using S for estimating the treatment effect on T, when S is completely observed and T is partially observed. In an ideal but often unrealistic setting, when S satisfies Prentice's definition for perfect surrogacy, there is the potential for substantial gain in precision by using data from S to estimate the treatment effect on T. When S is not close to a perfect surrogate, it can provide substantial information only under particular circumstances. We propose to use a targeted shrinkage regression approach that data-adaptively takes advantage of the potential efficiency gain yet avoids the need to make a strong surrogacy assumption. Simulations show that this approach strikes a balance between bias and efficiency gain. Compared with competing methods, it has better mean squared error properties and can achieve substantial efficiency gain, particularly in a common practical setting when S captures much but not all of the treatment effect and the sample size is relatively small. We apply the proposed method to a glaucoma data example.  相似文献   

17.
Peary caribou Rangifer tarandus pearyi is the northernmost subspecies of Rangifer in North America and endemic to the Canadian High Arctic. Because of severe population declines following years of unfavorable winter weather with ice coating on the ground or thicker snow cover, it is believed that density-independent disturbance events are the primary driver for Peary caribou population dynamics. However, it is unclear to what extent density dependence may affect population dynamics of this species. Here, we test for different levels of density dependence in a stochastic, single-stage population model, based on available empirical information for the Bathurst Island complex (BIC) population in the Canadian High Arctic. We compare predicted densities with observed densities during 1961–2001 under various assumptions of the strength of density dependence. On the basis of our model, we found that scenarios with no or very low density dependence led to population densities far above observed densities. For average observed disturbance regimes, a carrying capacity of 0.1 caribou km−2 generated an average caribou density similar to that estimated for the BIC population over the past four decades. With our model we also tested the potential effects of climate change-related increases in the probability and severity of disturbance years, that is unusually poor winter conditions. On the basis of our simulation results, we found that, in particular, potential increases in disturbance severity (as opposed to disturbance frequency) may pose a considerable threat to the persistence of this species.  相似文献   

18.
Recent studies indicate that polymorphic genetic markers are potentially helpful in resolving genealogical relationships among individuals in a natural population. Genetic data provide opportunities for paternity exclusion when genotypic incompatibilities are observed among individuals, and the present investigation examines the resolving power of genetic markers in unambiguous positive determination of paternity. Under the assumption that the mother for each offspring in a population is unambiguously known, an analytical expression for the fraction of males excluded from paternity is derived for the case where males and females may be derived from two different gene pools. This theoretical formulation can also be used to predict the fraction of births for each of which all but one male can be excluded from paternity. We show that even when the average probability of exclusion approaches unity, a substantial fraction of births yield equivocal mother-father-offspring determinations. The number of loci needed to increase the frequency of unambiguous determinations to a high level is beyond the scope of current electrophoretic studies in most species. Applications of this theory to electrophoretic data on Chamaelirium luteum (L.) shows that in 2255 offspring derived from 273 males and 70 females, only 57 triplets could be unequivocally determined with eight polymorphic protein loci, even though the average combined exclusionary power of these loci was 73%. The distribution of potentially compatible male parents, based on multilocus genotypes, was reasonably well predicted from the allele frequency data available for these loci. We demonstrate that genetic paternity analysis in natural populations cannot be reliably based on exclusionary principles alone. In order to measure the reproductive contributions of individuals in natural populations, more elaborate likelihood principles must be deployed.  相似文献   

19.
Classification has emerged as a major area of investigation in bioinformatics owing to the desire to discriminate phenotypes, in particular, disease conditions, using high-throughput genomic data. While many classification rules have been posed, there is a paucity of error estimation rules and an even greater paucity of theory concerning error estimation accuracy. This is problematic because the worth of a classifier depends mainly on its error rate. It is common place in bio-informatics papers to have a classification rule applied to a small labeled data set and the error of the resulting classifier be estimated on the same data set, most often via cross-validation, without any assumptions being made on the underlying feature-label distribution. Concomitant with a lack of distributional assumptions is the absence of any statement regarding the accuracy of the error estimate. Without such a measure of accuracy, the most common one being the root-mean-square (RMS), the error estimate is essentially meaningless and the worth of the entire paper is questionable. The concomitance of an absence of distributional assumptions and of a measure of error estimation accuracy is assured in small-sample settings because even when distribution-free bounds exist (and that is rare), the sample sizes required under the bounds are so large as to make them useless for small samples. Thus, distributional bounds are necessary and the distributional assumptions need to be stated. Owing to the epistemological dependence of classifiers on the accuracy of their estimated errors, scientifically meaningful distribution-free classification in high-throughput, small-sample biology is an illusion.  相似文献   

20.
Causal mutations and their intra- and inter-locus interactions play a critical role in complex trait variation. It is often not easy to detect epistatic quantitative trait loci (QTL) due to complicated population structure requirements for detecting epistatic effects in linkage analysis studies and due to main effects often being hidden by interaction effects. Mapping their positions is even harder when they are closely linked. The data structure requirement may be overcome when information on linkage disequilibrium is used. We present an approach using a mixed linear model nested in an empirical Bayesian approach, which simultaneously takes into account additive, dominance and epistatic effects due to multiple QTL. The covariance structure used in the mixed linear model is based on combined linkage disequilibrium and linkage information. In a simulation study where there are complex epistatic interactions between QTL, it is possible to simultaneously map interacting QTL into a small region using the proposed approach. The estimated variance components are accurate and less biased with the proposed approach compared with traditional models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号