首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Mao Y  Xu S 《Genetical research》2004,83(3):159-168
Many quantitative traits are measured as percentages. As a result, the assumption of a normal distribution for the residual errors of such percentage data is often violated. However, most quantitative trait locus (QTL) mapping procedures assume normality of the residuals. Therefore, proper data transformation is often recommended before statistical analysis is conducted. We propose the probit transformation to convert percentage data into variables with a normal distribution. The advantage of the probit transformation is that it can handle measurement errors with heterogeneous variance and correlation structure in a statistically sound manner. We compared the results of this data transformation with other transformations and found that this method can substantially increase the statistical power of QTL detection. We develop the QTL mapping procedure based on the maximum likelihood methodology implemented via the expectation-maximization algorithm. The efficacy of the new method is demonstrated using Monte Carlo simulation.  相似文献   

2.
With the increasing availability of large-scale GWAS summary data on various traits, Mendelian randomization (MR) has become commonly used to infer causality between a pair of traits, an exposure and an outcome. It depends on using genetic variants, typically SNPs, as instrumental variables (IVs). The inverse-variance weighted (IVW) method (with a fixed-effect meta-analysis model) is most powerful when all IVs are valid; however, when horizontal pleiotropy is present, it may lead to biased inference. On the other hand, Egger regression is one of the most widely used methods robust to (uncorrelated) pleiotropy, but it suffers from loss of power. We propose a two-component mixture of regressions to combine and thus take advantage of both IVW and Egger regression; it is often both more efficient (i.e. higher powered) and more robust to pleiotropy (i.e. controlling type I error) than either IVW or Egger regression alone by accounting for both valid and invalid IVs respectively. We propose a model averaging approach and a novel data perturbation scheme to account for uncertainties in model/IV selection, leading to more robust statistical inference for finite samples. Through extensive simulations and applications to the GWAS summary data of 48 risk factor-disease pairs and 63 genetically uncorrelated trait pairs, we showcase that our proposed methods could often control type I error better while achieving much higher power than IVW and Egger regression (and sometimes than several other new/popular MR methods). We expect that our proposed methods will be a useful addition to the toolbox of Mendelian randomization for causal inference.  相似文献   

3.
Numerous statistical methods have been developed for analyzing high‐dimensional data. These methods often focus on variable selection approaches but are limited for the purpose of testing with high‐dimensional data. They are often required to have explicit‐likelihood functions. In this article, we propose a “hybrid omnibus test” for high‐dicmensional data testing purpose with much weaker requirements. Our hybrid omnibus test is developed under a semiparametric framework where a likelihood function is no longer necessary. Our test is a version of a frequentist‐Bayesian hybrid score‐type test for a generalized partially linear single‐index model, which has a link function being a function of a set of variables through a generalized partially linear single index. We propose an efficient score based on estimating equations, define local tests, and then construct our hybrid omnibus test using local tests. We compare our approach with an empirical‐likelihood ratio test and Bayesian inference based on Bayes factors, using simulation studies. Our simulation results suggest that our approach outperforms the others, in terms of type I error, power, and computational cost in both the low‐ and high‐dimensional cases. The advantage of our approach is demonstrated by applying it to genetic pathway data for type II diabetes mellitus.  相似文献   

4.
The variance-components model is the method of choice for mapping quantitative trait loci in general human pedigrees. This model assumes normally distributed trait values and includes a major gene effect, random polygenic and environmental effects, and covariate effects. Violation of the normality assumption has detrimental effects on the type I error and power. One possible way of achieving normality is to transform trait values. The true transformation is unknown in practice, and different transformations may yield conflicting results. In addition, the commonly used transformations are ineffective in dealing with outlying trait values. We propose a novel extension of the variance-components model that allows the true transformation function to be completely unspecified. We present efficient likelihood-based procedures to estimate variance components and to test for genetic linkage. Simulation studies demonstrated that the new method is as powerful as the existing variance-components methods when the normality assumption holds; when the normality assumption fails, the new method still provides accurate control of type I error and is substantially more powerful than the existing methods. We performed a genomewide scan of monoamine oxidase B for the Collaborative Study on the Genetics of Alcoholism. In that study, the results that are based on the existing variance-components method changed dramatically when three outlying trait values were excluded from the analysis, whereas our method yielded essentially the same answers with or without those three outliers. The computer program that implements the new method is freely available.  相似文献   

5.
Wu C  Li G  Zhu J  Cui Y 《PloS one》2011,6(9):e24902
Functional mapping has been a powerful tool in mapping quantitative trait loci (QTL) underlying dynamic traits of agricultural or biomedical interest. In functional mapping, multivariate normality is often assumed for the underlying data distribution, partially due to the ease of parameter estimation. The normality assumption however could be easily violated in real applications due to various reasons such as heavy tails or extreme observations. Departure from normality has negative effect on testing power and inference for QTL identification. In this work, we relax the normality assumption and propose a robust multivariate t-distribution mapping framework for QTL identification in functional mapping. Simulation studies show increased mapping power and precision with the t distribution than that of a normal distribution. The utility of the method is demonstrated through a real data analysis.  相似文献   

6.
Nested case-control (NCC) design is used frequently in epidemiological studies as a cost-effective subcohort sampling strategy to conduct biomarker research. Sampling strategy, on the other hoand, creates challenges for data analysis because of outcome-dependent missingness in biomarker measurements. In this paper, we propose inverse probability weighted (IPW) methods for making inference about the prognostic accuracy of a novel biomarker for predicting future events with data from NCC studies. The consistency and asymptotic normality of these estimators are derived using the empirical process theory and convergence theorems for sequences of weakly dependent random variables. Simulation and analysis using Framingham Offspring Study data suggest that the proposed methods perform well in finite samples.  相似文献   

7.
1. Although the home range is a fundamental ecological concept, there is considerable debate over how it is best measured. There is a substantial literature concerning the precision and accuracy of all commonly used home range estimation methods; however, there has been considerably less work concerning how estimates vary with sampling regime, and how this affects statistical inferences. 2. We propose a new procedure, based on a variance components analysis using generalized mixed effects models to examine how estimates vary with sampling regime. 3. To demonstrate the method we analyse data from one study of 32 individually marked roe deer and another study of 21 individually marked kestrels. We subsampled these data to simulate increasingly less intense sampling regimes, and compared the performance of two kernel density estimation (KDE) methods, of the minimum convex polygon (MCP) and of the bivariate ellipse methods. 4. Variation between individuals and study areas contributed most to the total variance in home range size. Contrary to recent concerns over reliability, both KDE methods were remarkably efficient, robust and unbiased: 10 fixes per month, if collected over a standardized number of days, were sufficient for accurate estimates of home range size. However, the commonly used 95% isopleth should be avoided; we recommend using isopleths between 90 and 50%. 5. Using the same number of fixes does not guarantee unbiased home range estimates: statistical inferences differ with the number of days sampled, even if using KDE methods. 6. The MCP method was highly inefficient and results were subject to considerable and unpredictable biases. The bivariate ellipse was not the most reliable method at low sample sizes. 7. We conclude that effort should be directed at marking more individuals monitored over long periods at the expense of the sampling rate per individual. Statistical results are reliable only if the whole sampling regime is standardized. We derive practical guidelines for field studies and data analysis.  相似文献   

8.
Summary Identification of novel biomarkers for risk assessment is important for both effective disease prevention and optimal treatment recommendation. Discovery relies on the precious yet limited resource of stored biological samples from large prospective cohort studies. Case‐cohort sampling design provides a cost‐effective tool in the context of biomarker evaluation, especially when the clinical condition of interest is rare. Existing statistical methods focus on making efficient inference on relative hazard parameters from the Cox regression model. Drawing on recent theoretical development on the weighted likelihood for semiparametric models under two‐phase studies ( Breslow and Wellner, 2007 ), we propose statistical methods to evaluate accuracy and predictiveness of a risk prediction biomarker, with censored time‐to‐event outcome under stratified case‐cohort sampling. We consider nonparametric methods and a semiparametric method. We derive large sample properties of proposed estimators and evaluate their finite sample performance using numerical studies. We illustrate new procedures using data from Framingham Offspring Study to evaluate the accuracy of a recently developed risk score incorporating biomarker information for predicting cardiovascular disease.  相似文献   

9.
The problem of exact conditional inference for discrete multivariate case-control data has two forms. The first is grouped case-control data, where Monte Carlo computations can be done using the importance sampling method of Booth and Butler (1999, Biometrika86, 321-332), or a proposed alternative sequential importance sampling method. The second form is matched case-control data. For this analysis we propose a new exact sampling method based on the conditional-Poisson distribution for conditional testing with one binary and one integral ordered covariate. This method makes computations on data sets with large numbers of matched sets fast and accurate. We provide detailed derivation of the constraints and conditional distributions for conditional inference on grouped and matched data. The methods are illustrated on several new and old data sets.  相似文献   

10.
Automated methods for assigning peptides to observed tandem mass spectra typically return a list of peptide-spectrum matches, ranked according to an arbitrary score. In this article, we describe methods for converting these arbitrary scores into more useful statistical significance measures. These methods employ a decoy sequence database as a model of the null hypothesis, and use false discovery rate (FDR) analysis to correct for multiple testing. We first describe a simple FDR inference method and then describe how estimating and taking into account the percentage of incorrectly identified spectra in the entire data set can lead to increased statistical power.  相似文献   

11.
Summary As biological studies become more expensive to conduct, statistical methods that take advantage of existing auxiliary information about an expensive exposure variable are desirable in practice. Such methods should improve the study efficiency and increase the statistical power for a given number of assays. In this article, we consider an inference procedure for multivariate failure time with auxiliary covariate information. We propose an estimated pseudopartial likelihood estimator under the marginal hazard model framework and develop the asymptotic properties for the proposed estimator. We conduct simulation studies to evaluate the performance of the proposed method in practical situations and demonstrate the proposed method with a data set from the studies of left ventricular dysfunction ( SOLVD Investigators, 1991 , New England Journal of Medicine 325 , 293–302).  相似文献   

12.
Approximate Bayesian computation in population genetics   总被引:23,自引:0,他引:23  
Beaumont MA  Zhang W  Balding DJ 《Genetics》2002,162(4):2025-2035
We propose a new method for approximate Bayesian statistical inference on the basis of summary statistics. The method is suited to complex problems that arise in population genetics, extending ideas developed in this setting by earlier authors. Properties of the posterior distribution of a parameter, such as its mean or density curve, are approximated without explicit likelihood calculations. This is achieved by fitting a local-linear regression of simulated parameter values on simulated summary statistics, and then substituting the observed summary statistics into the regression equation. The method combines many of the advantages of Bayesian statistical inference with the computational efficiency of methods based on summary statistics. A key advantage of the method is that the nuisance parameters are automatically integrated out in the simulation step, so that the large numbers of nuisance parameters that arise in population genetics problems can be handled without difficulty. Simulation results indicate computational and statistical efficiency that compares favorably with those of alternative methods previously proposed in the literature. We also compare the relative efficiency of inferences obtained using methods based on summary statistics with those obtained directly from the data using MCMC.  相似文献   

13.
McGuire G  Prentice MJ  Wright F 《Biometrics》1999,55(4):1064-1070
The genetic distance between two DNA sequences may be measured by the average number of nucleotide substitutions per position that has occurred since the two sequences diverged from a common ancestor. Estimates of this quantity can be derived from Markov models for the substitution process, while the variances are estimated using the delta method and confidence intervals calculated assuming normality. However, when the sampling distribution of the estimator deviates from normality, such intervals will not be accurate. For simple one-parameter models of nucleotide substitution, we propose a transformation of normal confidence intervals, which yields an almost exact approximation to the true confidence intervals of the distance estimators. To calculate confidence intervals for more complicated models, we propose the saddlepoint approximation. A simulation study shows that the saddlepoint-derived confidence intervals are a real improvement over existing methods.  相似文献   

14.
Genetic assignment methods use genotype likelihoods to draw inference about where individuals were or were not born, potentially allowing direct, real-time estimates of dispersal. We used simulated data sets to test the power and accuracy of Monte Carlo resampling methods in generating statistical thresholds for identifying F0 immigrants in populations with ongoing gene flow, and hence for providing direct, real-time estimates of migration rates. The identification of accurate critical values required that resampling methods preserved the linkage disequilibrium deriving from recent generations of immigrants and reflected the sampling variance present in the data set being analysed. A novel Monte Carlo resampling method taking into account these aspects was proposed and its efficiency was evaluated. Power and error were relatively insensitive to the frequency assumed for missing alleles. Power to identify F0 immigrants was improved by using large sample size (up to about 50 individuals) and by sampling all populations from which migrants may have originated. A combination of plotting genotype likelihoods and calculating mean genotype likelihood ratios (DLR) appeared to be an effective way to predict whether F0 immigrants could be identified for a particular pair of populations using a given set of markers.  相似文献   

15.
Z Li  J M?tt?nen  M J Sillanp?? 《Heredity》2015,115(6):556-564
Linear regression-based quantitative trait loci/association mapping methods such as least squares commonly assume normality of residuals. In genetics studies of plants or animals, some quantitative traits may not follow normal distribution because the data include outlying observations or data that are collected from multiple sources, and in such cases the normal regression methods may lose some statistical power to detect quantitative trait loci. In this work, we propose a robust multiple-locus regression approach for analyzing multiple quantitative traits without normality assumption. In our method, the objective function is least absolute deviation (LAD), which corresponds to the assumption of multivariate Laplace distributed residual errors. This distribution has heavier tails than the normal distribution. In addition, we adopt a group LASSO penalty to produce shrinkage estimation of the marker effects and to describe the genetic correlation among phenotypes. Our LAD-LASSO approach is less sensitive to the outliers and is more appropriate for the analysis of data with skewedly distributed phenotypes. Another application of our robust approach is on missing phenotype problem in multiple-trait analysis, where the missing phenotype items can simply be filled with some extreme values, and be treated as outliers. The efficiency of the LAD-LASSO approach is illustrated on both simulated and real data sets.  相似文献   

16.
L. Xue  L. Wang  A. Qu 《Biometrics》2010,66(2):393-404
Summary We propose a new estimation method for multivariate failure time data using the quadratic inference function (QIF) approach. The proposed method efficiently incorporates within‐cluster correlations. Therefore, it is more efficient than those that ignore within‐cluster correlation. Furthermore, the proposed method is easy to implement. Unlike the weighted estimating equations in Cai and Prentice (1995, Biometrika 82 , 151–164), it is not necessary to explicitly estimate the correlation parameters. This simplification is particularly useful in analyzing data with large cluster size where it is difficult to estimate intracluster correlation. Under certain regularity conditions, we show the consistency and asymptotic normality of the proposed QIF estimators. A chi‐squared test is also developed for hypothesis testing. We conduct extensive Monte Carlo simulation studies to assess the finite sample performance of the proposed methods. We also illustrate the proposed methods by analyzing primary biliary cirrhosis (PBC) data.  相似文献   

17.
We propose a semiparametric mean residual life mixture cure model for right-censored survival data with a cured fraction. The model employs the proportional mean residual life model to describe the effects of covariates on the mean residual time of uncured subjects and the logistic regression model to describe the effects of covariates on the cure rate. We develop estimating equations to estimate the proposed cure model for the right-censored data with and without length-biased sampling, the latter is often found in prevalent cohort studies. In particular, we propose two estimating equations to estimate the effects of covariates in the cure rate and a method to combine them to improve the estimation efficiency. The consistency and asymptotic normality of the proposed estimates are established. The finite sample performance of the estimates is confirmed with simulations. The proposed estimation methods are applied to a clinical trial study on melanoma and a prevalent cohort study on early-onset type 2 diabetes mellitus.  相似文献   

18.
19.
Much modern work in phylogenetics depends on statistical sampling approaches to phylogeny construction to estimate probability distributions of possible trees for any given input data set. Our theoretical understanding of sampling approaches to phylogenetics remains far less developed than that for optimization approaches, however, particularly with regard to the number of sampling steps needed to produce accurate samples of tree partition functions. Despite the many advantages in principle of being able to sample trees from sophisticated probabilistic models, we have little theoretical basis for concluding that the prevailing sampling approaches do in fact yield accurate samples from those models within realistic numbers of steps. We propose a novel approach to phylogenetic sampling intended to be both efficient in practice and more amenable to theoretical analysis than the prevailing methods. The method depends on replacing the standard tree rearrangement moves with an alternative Markov model in which one solves a theoretically hard but practically tractable optimization problem on each step of sampling. The resulting method can be applied to a broad range of standard probability models, yielding practical algorithms for efficient sampling and rigorous proofs of accurate sampling for heated versions of some important special cases. We demonstrate the efficiency and versatility of the method by an analysis of uncertainty in tree inference over varying input sizes. In addition to providing a new practical method for phylogenetic sampling, the technique is likely to prove applicable to many similar problems involving sampling over combinatorial objects weighted by a likelihood model.  相似文献   

20.
The outcome-dependent sampling (ODS) design, which allows observation of exposure variable to depend on the outcome, has been shown to be cost efficient. In this article, we propose a new statistical inference method, an estimated penalized likelihood method, for a partial linear model in the setting of a 2-stage ODS with a continuous outcome. We develop the asymptotic properties and conduct simulation studies to demonstrate the performance of the proposed estimator. A real environmental study data set is used to illustrate the proposed method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号