首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Daye ZJ  Chen J  Li H 《Biometrics》2012,68(1):316-326
We consider the problem of high-dimensional regression under non-constant error variances. Despite being a common phenomenon in biological applications, heteroscedasticity has, so far, been largely ignored in high-dimensional analysis of genomic data sets. We propose a new methodology that allows non-constant error variances for high-dimensional estimation and model selection. Our method incorporates heteroscedasticity by simultaneously modeling both the mean and variance components via a novel doubly regularized approach. Extensive Monte Carlo simulations indicate that our proposed procedure can result in better estimation and variable selection than existing methods when heteroscedasticity arises from the presence of predictors explaining error variances and outliers. Further, we demonstrate the presence of heteroscedasticity in and apply our method to an expression quantitative trait loci (eQTLs) study of 112 yeast segregants. The new procedure can automatically account for heteroscedasticity in identifying the eQTLs that are associated with gene expression variations and lead to smaller prediction errors. These results demonstrate the importance of considering heteroscedasticity in eQTL data analysis.  相似文献   

2.
Qihuang Zhang  Grace Y. Yi 《Biometrics》2023,79(2):1089-1102
Zero-inflated count data arise frequently from genomics studies. Analysis of such data is often based on a mixture model which facilitates excess zeros in combination with a Poisson distribution, and various inference methods have been proposed under such a model. Those analysis procedures, however, are challenged by the presence of measurement error in count responses. In this article, we propose a new measurement error model to describe error-contaminated count data. We show that ignoring the measurement error effects in the analysis may generally lead to invalid inference results, and meanwhile, we identify situations where ignoring measurement error can still yield consistent estimators. Furthermore, we propose a Bayesian method to address the effects of measurement error under the zero-inflated Poisson model and discuss the identifiability issues. We develop a data-augmentation algorithm that is easy to implement. Simulation studies are conducted to evaluate the performance of the proposed method. We apply our method to analyze the data arising from a prostate adenocarcinoma genomic study.  相似文献   

3.
The distribution of a phenotype on a phylogenetic tree is often a quantity of interest. Many phenotypes have imperfect heritability, so that a measurement of the phenotype for an individual can be thought of as a single realization from the phenotype distribution of that individual. If all individuals in a phylogeny had the same phenotype distribution, measured phenotypes would be randomly distributed on the tree leaves. This is, however, often not the case, implying that the phenotype distribution evolves over time. Here we propose a new model based on this principle of evolving phenotype distribution on the branches of a phylogeny, which is different from ancestral state reconstruction where the phenotype itself is assumed to evolve. We develop an efficient Bayesian inference method to estimate the parameters of our model and to test the evidence for changes in the phenotype distribution. We use multiple simulated data sets to show that our algorithm has good sensitivity and specificity properties. Since our method identifies branches on the tree on which the phenotype distribution has changed, it is able to break down a tree into components for which this distribution is unique and constant. We present two applications of our method, one investigating the association between HIV genetic variation and human leukocyte antigen and the other studying host range distribution in a lineage of Salmonella enterica, and we discuss many other potential applications.  相似文献   

4.
The expression for rth cumulant of ANOVA estimator of group variance component is derived in the One-way unbalanced random model under heteroscedasticity. The expression is used to study the effect of unbalancedness and heteroscedasticity on the mean and variance of the estimator, numerically. The computed results reveal that the unbalancedness and heteroscedasticity have a combined effect on the mean and variance of the estimator. For certain situations of unequal group sizes and error variances, the mean and variance of the estimator are increased and for certain other situations the values are decreased.  相似文献   

5.
PathMiner: predicting metabolic pathways by heuristic search   总被引:1,自引:0,他引:1  
MOTIVATION: Automated methods for biochemical pathway inference are becoming increasingly important for understanding biological processes in living and synthetic systems. With the availability of data on complete genomes and increasing information about enzyme-catalyzed biochemistry it is becoming feasible to approach this problem computationally. In this paper we present PathMiner, a system for automatic metabolic pathway inference. PathMiner predicts metabolic routes by reasoning over transformations using chemical and biological information. RESULTS: We build a biochemical state-space using data from known enzyme-catalyzed transformations in Ligand, including, 2917 unique transformations between 3890 different compounds. To predict metabolic pathways we explore this state-space by developing an informed search algorithm. For this purpose we develop a chemically motivated heuristic to guide the search. Since the algorithm does not depend on predefined pathways, it can efficiently identify plausible routes using known biochemical transformations.  相似文献   

6.
Recent technological advances continue to provide noninvasive and more accurate biomarkers for evaluating disease status. One standard tool for assessing the accuracy of diagnostic tests is the receiver operating characteristic (ROC) curve. Few statistical methods exist to accommodate multiple continuous‐scale biomarkers in the framework of ROC analysis. In this paper, we propose a method to integrate continuous‐scale biomarkers to optimize classification accuracy. Specifically, we develop semiparametric transformation models for multiple biomarkers. We assume that unknown and marker‐specific transformations of biomarkers follow a multivariate normal distribution. Our models accommodate biomarkers subject to limits of detection and account for the dependence among biomarkers by including a subject‐specific random effect. We also propose a diagnostic measure using an optimal linear combination of the transformed biomarkers. Our diagnostic rule does not depend on any monotone transformation of biomarkers and is not sensitive to extreme biomarker values. Nonparametric maximum likelihood estimation (NPMLE) is used for inference. We show that the parameter estimators are asymptotically normal and efficient. We illustrate our semiparametric approach using data from the Endometriosis, Natural History, Diagnosis, and Outcomes (ENDO) study.  相似文献   

7.
P. Dutilleul  C. Potvin 《Genetics》1995,139(4):1815-1829
The impact of among-environment heteroscedasticity and genetic autocorrelation on the analysis of phenotypic plasticity is examined. Among-environment heteroscedasticity occurs when genotypic variances differ among environments. Genetic autocorrelation arises whenever the responses of a genotype to different environments are more or less similar than expected for observations randomly associated. In a multivariate analysis-of-variance model, three transformations of genotypic profiles (reaction norms), which apply to the residuals of the model while preserving the mean responses within environments, are derived. The transformations remove either among-environment heteroscedasticity, genetic autocorrelation or both. When both nuisances are not removed, statistical tests are corrected in a modified univariate approach using the sample covariance matrix of the genotypic profiles. Methods are illustrated on a Chlamydomonas reinhardtii data set. When heteroscedasticity was removed, the variance component associated with the genotype-by-environment interaction increased proportionally to the genotype variance component. As a result, the genetic correlation r(g) was altered. Genetic autocorrelation was responsible for statistical significance of the genotype-by-environment interaction and genotype main effects on raw data. When autocorrelation was removed, the ranking of genotypes according to their stability index dramatically changed. Evolutionary implications of our methods and results are discussed.  相似文献   

8.
Spatial extent inference (SEI) is widely used across neuroimaging modalities to adjust for multiple comparisons when studying brain‐phenotype associations that inform our understanding of disease. Recent studies have shown that Gaussian random field (GRF)‐based tools can have inflated family‐wise error rates (FWERs). This has led to substantial controversy as to which processing choices are necessary to control the FWER using GRF‐based SEI. The failure of GRF‐based methods is due to unrealistic assumptions about the spatial covariance function of the imaging data. A permutation procedure is the most robust SEI tool because it estimates the spatial covariance function from the imaging data. However, the permutation procedure can fail because its assumption of exchangeability is violated in many imaging modalities. Here, we propose the (semi‐) parametric bootstrap joint (PBJ; sPBJ) testing procedures that are designed for SEI of multilevel imaging data. The sPBJ procedure uses a robust estimate of the spatial covariance function, which yields consistent estimates of standard errors, even if the covariance model is misspecified. We use the methods to study the association between performance and executive functioning in a working memory functional magnetic resonance imaging study. The sPBJ has similar or greater power to the PBJ and permutation procedures while maintaining the nominal type 1 error rate in reasonable sample sizes. We provide an R package to perform inference using the PBJ and sPBJ procedures.  相似文献   

9.
Liya Fu  You‐Gan Wang 《Biometrics》2012,68(4):1074-1082
Summary Rank‐based inference is widely used because of its robustness. This article provides optimal rank‐based estimating functions in analysis of clustered data with random cluster effects. The extensive simulation studies carried out to evaluate the performance of the proposed method demonstrate that it is robust to outliers and is highly efficient given the existence of strong cluster correlations. The performance of the proposed method is satisfactory even when the correlation structure is misspecified, or when heteroscedasticity in variance is present. Finally, a real dataset is analyzed for illustration.  相似文献   

10.
M. Sundaram  I. Greenwald 《Genetics》1993,135(3):755-763
The lin-12 gene of Caenorhabditis elegans is thought to encode a receptor for intercellular signals that specify certain cell fates during development. We describe several alleles of lin-12 that reduce but do not eliminate lin-12 activity (hypomorphic alleles). These alleles cause a novel egg-laying defective (Egl) phenotype in hermaphrodites as well as incompletely penetrant cell fate transformations seen with high penetrance in lin-12 null mutants. Characterization of the Egl phenotype revealed additional roles of lin-12 in the development of the egg-laying system that were not apparent from studying lin-12 null mutants: lin-12 activity is required for proper early vulval morphogenesis as well as for some unknown later aspect of egg-laying system development. Reversion of the Egl phenotype caused by one lin-12 hypomorphic allele was used to identify potential interacting genes as described in the accompanying paper.  相似文献   

11.
Statistical inference for microarray experiments usually involves the estimation of error variance for each gene. Because the sample size available for each gene is often low, the usual unbiased estimator of the error variance can be unreliable. Shrinkage methods, including empirical Bayes approaches that borrow information across genes to produce more stable estimates, have been developed in recent years. Because the same microarray platform is often used for at least several experiments to study similar biological systems, there is an opportunity to improve variance estimation further by borrowing information not only across genes but also across experiments. We propose a lognormal model for error variances that involves random gene effects and random experiment effects. Based on the model, we develop an empirical Bayes estimator of the error variance for each combination of gene and experiment and call this estimator BAGE because information is Borrowed Across Genes and Experiments. A permutation strategy is used to make inference about the differential expression status of each gene. Simulation studies with data generated from different probability models and real microarray data show that our method outperforms existing approaches.  相似文献   

12.
The problem of error in the phylogenetic reconstruction of ancestral character states is explored by developing the model of Frumhoff and Reeve (1994). Information about the evolutionary rate of change within a character is inferred from the distribution of its character states on a known phylogeny, and this information is used to impose confidence limits on the error associated with ancestral state inference. Ancestral state inference is found to be remarkably robust under the model assumptions for a wide range of parameter values; however, the probability of error increases when the number of species within a clade is small and/or state-transition probabilities are strongly skewed in favor of the non-ancestral state. The rationale for expecting such a skew, a hypothesis of parallelism, is shown to rely on assumptions of low rates of change in at least two phylogenetically inherited characters: the tendency to occupy a particular ecological niche and the tendency to respond in a particular way to selection. A means for judging the relative likelihoods of parallelism vs. straightforward homology as explanations for a given character-state distribution is suggested. General problems with the model are discussed, as are methods for making it more realistic.  相似文献   

13.
Jones MC  Pewsey A 《Biometrics》2012,68(1):183-193
We provide four-parameter families of distributions on the circle which are unimodal and display the widest ranges of both skewness and peakedness yet available. Our approach is to transform the scale of a generating distribution, such as the von Mises, using various nontrivial extensions of an approach first used in Batschelet's (1981, Circular Statistics in Biology) book. The key is to employ inverses of Batschelet-type transformations in certain ways; these exhibit considerable advantages over direct Batschelet transformations. The skewness transformation is especially appealing as it has no effect on the normalizing constant. As well as a variety of interesting theoretical properties, when likelihood inference is explored these distributions display orthogonality between elements of a pairing of parameters into (location, skewness) and (concentration, peakedness). Further, the location parameter can sometimes be made approximately orthogonal to all the other parameters. Profile likelihoods come to the fore in practice. Two illustrative applications, one concerning the locomotion of a Drosophila fly larva, the other analyzing a large set of sudden infant death syndrome data, are investigated.  相似文献   

14.
Cells and bacteria growing in culture are subject to mutation, and as this mutation is the ultimate substrate for selection and evolution, the factors controlling the mutation rate are of some interest. The mutational event is not observed directly, but is inferred from the phenotype of the original mutant or of its descendants; the rate of mutation is inferred from the number of such mutant phenotypes. Such inference presumes a knowledge of the probability distribution for the size of a clone arising from a single mutation. We develop a mathematical formulation that assists in the design and analysis of experiments which investigate mutation rates and mutant clone size distribution, and we use it to analyse data for which the classical Luria-Delbrück clone-size distribution must be rejected.  相似文献   

15.
Distribution models should take into account the different limiting factors that simultaneously influence species ranges. Species distribution models built with different explanatory variables can be combined into more comprehensive ones, but the resulting models should maximize complementarity and avoid redundancy. Our aim was to compare the different methods available for combining species distribution models. We modelled 19 threatened vertebrate species in mainland Spain, producing models according to three individual explanatory factors: spatial constraints, topography and climate, and human influence. We used five approaches for model combination: Bayesian inference, Akaike weight averaging, stepwise variable selection, updating, and fuzzy logic. We compared the performance of these approaches by assessing different aspects of their classification and discrimination capacity. We demonstrated that different approaches to model combination give rise to disparities in the model outputs. Bayesian integration was systematically affected by an error in the equations that are habitually used in distribution modelling. Akaike weights produced models that were driven by the best single factor and therefore failed at combining the models effectively. The updating and the stepwise approaches shared recalibration as the basic concept for model combination, were very similar in their performance, and showed the highest sensitivity and discrimination capacity. The fuzzy‐logic approach yielded models with the highest classification capacity according to Cohen's kappa. In conclusion: 1) Bayesian integration, employing the currently used equation, and the Akaike weight procedure should be avoided; 2) the updating and stepwise approaches can be considered minor variants of the same recalibrating approach; and 3) there is a trade‐off between this recalibrating approach, which has the highest sensitivity, and fuzzy logic, which has the highest overall classification capacity. Recalibration is better if unfavourable conditions in one environmental factor may be counterbalanced with favourable conditions in a different factor, otherwise fuzzy logic is better.  相似文献   

16.
This paper is concerned with the generalized model E(φ(Y)! X) =φ(X) involving the transformations on both the predictor vector X and the response variable Y. For this purpose, Taylor expansions and canonical analysis are applied. For optimizing the expansions, it is shown by a simulation study that not only prediction error, the combination of model error and noise error, is an important index, but the distribution of the residuals and the t-values of the coefficients also must be considered. Furthermore, the results of penicillin titrition show that the practical situations often need to be considered in selecting an appropriate model for a real-life problem.  相似文献   

17.
Measurement error in exposure variables is a serious impediment in epidemiological studies that relate exposures to health outcomes. In nutritional studies, interest could be in the association between long‐term dietary intake and disease occurrence. Long‐term intake is usually assessed with food frequency questionnaire (FFQ), which is prone to recall bias. Measurement error in FFQ‐reported intakes leads to bias in parameter estimate that quantifies the association. To adjust for bias in the association, a calibration study is required to obtain unbiased intake measurements using a short‐term instrument such as 24‐hour recall (24HR). The 24HR intakes are used as response in regression calibration to adjust for bias in the association. For foods not consumed daily, 24HR‐reported intakes are usually characterized by excess zeroes, right skewness, and heteroscedasticity posing serious challenge in regression calibration modeling. We proposed a zero‐augmented calibration model to adjust for measurement error in reported intake, while handling excess zeroes, skewness, and heteroscedasticity simultaneously without transforming 24HR intake values. We compared the proposed calibration method with the standard method and with methods that ignore measurement error by estimating long‐term intake with 24HR and FFQ‐reported intakes. The comparison was done in real and simulated datasets. With the 24HR, the mean increase in mercury level per ounce fish intake was about 0.4; with the FFQ intake, the increase was about 1.2. With both calibration methods, the mean increase was about 2.0. Similar trend was observed in the simulation study. In conclusion, the proposed calibration method performs at least as good as the standard method.  相似文献   

18.
In epidemiologic studies, measurement error in dietary variables often attenuates association between dietary intake and disease occurrence. To adjust for the attenuation caused by error in dietary intake, regression calibration is commonly used. To apply regression calibration, unbiased reference measurements are required. Short-term reference measurements for foods that are not consumed daily contain excess zeroes that pose challenges in the calibration model. We adapted two-part regression calibration model, initially developed for multiple replicates of reference measurements per individual to a single-replicate setting. We showed how to handle excess zero reference measurements by two-step modeling approach, how to explore heteroscedasticity in the consumed amount with variance-mean graph, how to explore nonlinearity with the generalized additive modeling (GAM) and the empirical logit approaches, and how to select covariates in the calibration model. The performance of two-part calibration model was compared with the one-part counterpart. We used vegetable intake and mortality data from European Prospective Investigation on Cancer and Nutrition (EPIC) study. In the EPIC, reference measurements were taken with 24-hour recalls. For each of the three vegetable subgroups assessed separately, correcting for error with an appropriately specified two-part calibration model resulted in about three fold increase in the strength of association with all-cause mortality, as measured by the log hazard ratio. Further found is that the standard way of including covariates in the calibration model can lead to over fitting the two-part calibration model. Moreover, the extent of adjusting for error is influenced by the number and forms of covariates in the calibration model. For episodically consumed foods, we advise researchers to pay special attention to response distribution, nonlinearity, and covariate inclusion in specifying the calibration model.  相似文献   

19.
Bivariate line-fitting methods for allometry   总被引:14,自引:0,他引:14  
Fitting a line to a bivariate dataset can be a deceptively complex problem, and there has been much debate on this issue in the literature. In this review, we describe for the practitioner the essential features of line-fitting methods for estimating the relationship between two variables: what methods are commonly used, which method should be used when, and how to make inferences from these lines to answer common research questions. A particularly important point for line-fitting in allometry is that usually, two sources of error are present (which we call measurement and equation error), and these have quite different implications for choice of line-fitting method. As a consequence, the approach in this review and the methods presented have subtle but important differences from previous reviews in the biology literature. Linear regression, major axis and standardised major axis are alternative methods that can be appropriate when there is no measurement error. When there is measurement error, this often needs to be estimated and used to adjust the variance terms in formulae for line-fitting. We also review line-fitting methods for phylogenetic analyses. Methods of inference are described for the line-fitting techniques discussed in this paper. The types of inference considered here are testing if the slope or elevation equals a given value, constructing confidence intervals for the slope or elevation, comparing several slopes or elevations, and testing for shift along the axis amongst several groups. In some cases several methods have been proposed in the literature. These are discussed and compared. In other cases there is little or no previous guidance available in the literature. Simulations were conducted to check whether the methods of inference proposed have the intended coverage probability or Type I error. We identified the methods of inference that perform well and recommend the techniques that should be adopted in future work.  相似文献   

20.
Marginal methods have been widely used for the analysis of longitudinal ordinal and categorical data. These models do not require full parametric assumptions on the joint distribution of repeated response measurements but only specify the marginal or even association structures. However, inference results obtained from these methods often incur serious bias when variables are subject to error. In this paper, we tackle the problem that misclassification exists in both response and categorical covariate variables. We develop a marginal method for misclassification adjustment, which utilizes second‐order estimating functions and a functional modeling approach, and can yield consistent estimates and valid inference for mean and association parameters. We propose a two‐stage estimation approach for cases in which validation data are available. Our simulation studies show good performance of the proposed method under a variety of settings. Although the proposed method is phrased to data with a longitudinal design, it also applies to correlated data arising from clustered and family studies, in which association parameters may be of scientific interest. The proposed method is applied to analyze a dataset from the Framingham Heart Study as an illustration.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号