首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.

Background  

In the analysis of microarray data one generally produces a vector of p-values that for each gene give the likelihood of obtaining equally strong evidence of change by pure chance. The distribution of these p-values is a mixture of two components corresponding to the changed genes and the unchanged ones. The focus of this article is how to estimate the proportion unchanged and the false discovery rate (FDR) and how to make inferences based on these concepts. Six published methods for estimating the proportion unchanged genes are reviewed, two alternatives are presented, and all are tested on both simulated and real data. All estimates but one make do without any parametric assumptions concerning the distributions of the p-values. Furthermore, the estimation and use of the FDR and the closely related q-value is illustrated with examples. Five published estimates of the FDR and one new are presented and tested. Implementations in R code are available.  相似文献   

3.
The evolutionary history of 19 populations of Littorina saxatilis (Olivi) was estimated by four different approaches. Three of these operate upon a population by population matrix of genetic distances: average linkage clustering, and two versions of the Fitch-Margoliash method. The fourth method was a maximum likelihood estimate based on differences in allele frequencies between populations. The study aims to assess how well each method estimates the phylogeny by including seven populations of the closely related species L. arcana Hannaford Ellis. The rationale behind this is that a good estimation technique should be able to separate these two monophyletic taxa.The results show that, by our criteria, the maximum likelihood method yields the best estimate and the unconstrained Fitch-Margoliash technique gives reasonable estimates. Both average-linkage clustering and the Fitch-Margoliash method with evolutionary clock perform less well. We argue that this is expected since both these techniques are based on probably unrealistic assumptions such as the overall rate of evolutionary divergence being homogeneous over phyletic lines.  相似文献   

4.
Many molecular ecology analyses assume the genotyped individuals are sampled at random from a population and thus are representative of the population. Realistically, however, a sample may contain excessive close relatives (ECR) because, for example, localized juveniles are drawn from fecund species. Our knowledge is limited about how ECR affect the routinely conducted elementary genetics analyses, and how ECR are best dealt with to yield unbiased and accurate parameter estimates. This study quantifies the effects of ECR on some popular population genetics analyses of marker data, including the estimation of allele frequencies, F‐statistics, expected heterozygosity (He), effective and observed numbers of alleles, and the tests of Hardy–Weinberg equilibrium (HWE) and linkage equilibrium (LE). It also investigates several strategies for handling ECR to mitigate their impact and to yield accurate parameter estimates. My analytical work, assisted by simulations, shows that ECR have large and global effects on all of the above marker analyses. The naïve approach of simply ignoring ECR could yield low‐precision and often biased parameter estimates, and could cause too many false rejections of HWE and LE. The bold approach, which simply identifies and removes ECR, and the cautious approach, which estimates target parameters (e.g., He) by accounting for ECR and using naïve allele frequency estimates, eliminate the bias and the false HWE and LE rejections, but could reduce estimation precision substantially. The likelihood approach, which accounts for ECR in estimating allele frequencies and thus target parameters relying on allele frequencies, usually yields unbiased and the most accurate parameter estimates. Which of the four approaches is the most effective and efficient may depend on the particular marker analysis to be conducted. The results are discussed in the context of using marker data for understanding population properties and marker properties.  相似文献   

5.
Growing interest in adaptive evolution in natural populations has spurred efforts to infer genetic components of variance and covariance of quantitative characters. Here, I review difficulties inherent in the usual least-squares methods of estimation. A useful alternative approach is that of maximum likelihood (ML). Its particular advantage over least squares is that estimation and testing procedures are well defined, regardless of the design of the data. A modified version of ML, REML, eliminates the bias of ML estimates of variance components. Expressions for the expected bias and variance of estimates obtained from balanced, fully hierarchical designs are presented for ML and REML. Analyses of data simulated from balanced, hierarchical designs reveal differences in the properties of ML, REML, and F-ratio tests of significance. A second simulation study compares properties of REML estimates obtained from a balanced, fully hierarchical design (within-generation analysis) with those from a sampling design including phenotypic data on parents and multiple progeny. It also illustrates the effects of imposing nonnegativity constraints on the estimates. Finally, it reveals that predictions of the behavior of significance tests based on asymptotic theory are not accurate when sample size is small and that constraining the estimates seriously affects properties of the tests. Because of their great flexibility, likelihood methods can serve as a useful tool for estimation of quantitative-genetic parameters in natural populations. Difficulties involved in hypothesis testing remain to be solved.  相似文献   

6.
Landscape genetics lacks explicit methods for dealing with the uncertainty in landscape resistance estimation, which is particularly problematic when sample sizes of individuals are small. Unless uncertainty can be quantified, valuable but small data sets may be rendered unusable for conservation purposes. We offer a method to quantify uncertainty in landscape resistance estimates using multimodel inference as an improvement over single model‐based inference. We illustrate the approach empirically using co‐occurring, woodland‐preferring Australian marsupials within a common study area: two arboreal gliders (Petaurus breviceps, and Petaurus norfolcensis) and one ground‐dwelling antechinus (Antechinus flavipes). First, we use maximum‐likelihood and a bootstrap procedure to identify the best‐supported isolation‐by‐resistance model out of 56 models defined by linear and non‐linear resistance functions. We then quantify uncertainty in resistance estimates by examining parameter selection probabilities from the bootstrapped data. The selection probabilities provide estimates of uncertainty in the parameters that drive the relationships between landscape features and resistance. We then validate our method for quantifying uncertainty using simulated genetic and landscape data showing that for most parameter combinations it provides sensible estimates of uncertainty. We conclude that small data sets can be informative in landscape genetic analyses provided uncertainty can be explicitly quantified. Being explicit about uncertainty in landscape genetic models will make results more interpretable and useful for conservation decision‐making, where dealing with uncertainty is critical.  相似文献   

7.
高猛 《生态学报》2016,36(14):4406-4414
最近邻体法是一类有效的植物空间分布格局分析方法,邻体距离的概率分布模型用于描述邻体距离的统计特征,属于常用的最近邻体法之一。然而,聚集分布格局中邻体距离(个体到个体)的概率分布模型表达式复杂,参数估计的计算量大。根据该模型期望和方差的特性,提出了一种简化的参数估计方法,并利用遗传算法来实现参数优化,结果表明遗传算法可以有效地估计的该模型的两个参数。同时,利用该模型拟合了加拿大南温哥华岛3个寒温带树种的空间分布数据,结果显示:该概率分布模型可以很好地拟合美国花旗松(P.menziesii)和西部铁杉(T.heterophylla)的邻体距离分布,但由于西北红柏(T.plicata)存在高度聚集的团簇分布,拟合结果不理想;美国花旗松在样地中近似随机分布,空间聚集参数对空间尺度的依赖性不强,但西北红柏和西部铁杉空间聚集参数具有尺度依赖性,随邻体距离阶数增加而变大。最后,讨论了该模型以及参数估计方法的优势和限制。  相似文献   

8.
In this work we address the problem of the robust identification of unknown parameters of a cell population dynamics model from experimental data on the kinetics of cells labelled with a fluorescence marker defining the division age of the cell. The model is formulated by a first order hyperbolic PDE for the distribution of cells with respect to the structure variable x (or z) being the intensity level (or the log10-transformed intensity level) of the marker. The parameters of the model are the rate functions of cell division, death, label decay and the label dilution factor. We develop a computational approach to the identification of the model parameters with a particular focus on the cell birth rate α(z) as a function of the marker intensity, assuming the other model parameters are scalars to be estimated. To solve the inverse problem numerically, we parameterize α(z) and apply a maximum likelihood approach. The parametrization is based on cubic Hermite splines defined on a coarse mesh with either equally spaced a priori fixed nodes or nodes to be determined in the parameter estimation procedure. Ill-posedness of the inverse problem is indicated by multiple minima. To treat the ill-posed problem, we apply Tikhonov regularization with the regularization parameter determined by the discrepancy principle. We show that the solution of the regularized parameter estimation problem is consistent with the data set with an accuracy within the noise level in the measurements.   相似文献   

9.
Phylogenetic analysis using parsimony and likelihood methods   总被引:1,自引:0,他引:1  
The assumptions underlying the maximum-parsimony (MP) method of phylogenetic tree reconstruction were intuitively examined by studying the way the method works. Computer simulations were performed to corroborate the intuitive examination. Parsimony appears to involve very stringent assumptions concerning the process of sequence evolution, such as constancy of substitution rates between nucleotides, constancy of rates across nucleotide sites, and equal branch lengths in the tree. For practical data analysis, the requirement of equal branch lengths means similar substitution rates among lineages (the existence of an approximate molecular clock), relatively long interior branches, and also few species in the data. However, a small amount of evolution is neither a necessary nor a sufficient requirement of the method. The difficulties involved in the application of current statistical estimation theory to tree reconstruction were discussed, and it was suggested that the approach proposed by Felsenstein (1981,J. Mol. Evol. 17: 368–376) for topology estimation, as well as its many variations and extensions, differs fundamentally from the maximum likelihood estimation of a conventional statistical parameter. Evidence was presented showing that the Felsenstein approach does not share the asymptotic efficiency of the maximum likelihood estimator of a statistical parameter. Computer simulations were performed to study the probability that MP recovers the true tree under a hierarchy of models of nucleotide substitution; its performance relative to the likelihood method was especially noted. The results appeared to support the intuitive examination of the assumptions underlying MP. When a simple model of nucleotide substitution was assumed to generate data, the probability that MP recovers the true topology could be as high as, or even higher than, that for the likelihood method. When the assumed model became more complex and realistic, e.g., when substitution rates were allowed to differ between nucleotides or across sites, the probability that MP recovers the true topology, and especially its performance relative to that of the likelihood method, generally deteriorates. As the complexity of the process of nucleotide substitution in real sequences is well recognized, the likelihood method appears preferable to parsimony. However, the development of a statistical methodology for the efficient estimation of the tree topology remains a difficult open problem.  相似文献   

10.
In spite of the usefulness of codominant markers in population genetics, the existence of null alleles raises challenging estimation issues in natural populations that are characterized by positive inbreeding coefficients (F > 0). Disregarding the possibility of > 0 in a population will generally lead to overestimates of null allele frequencies. Conversely, estimates of inbreeding coefficients (F) may be strongly biased upwards (excess homozygotes), in the presence of nontrivial frequencies of null alleles. An algorithm has been presented for the estimation of null allele frequencies in inbred populations (van Oosterhout method), using external estimates of the F‐statistics. The goal of this study is to introduce a modification of this method and to provide a formal comparison with an alternative likelihood‐based method (Chybicki‐Burczyk). Using simulated data, we illustrate the strengths and limitations of these competing methods. Under most circumstances, the likelihood method is preferable, but for highly inbred organisms, a modified van Oosterhout method offers some advantages.  相似文献   

11.
ABSTRACT: BACKGROUND: Linkage analysis is a useful tool for detecting genetic variants that regulate a trait of interest, especially genes associated with a given disease. Although penetrance parameters play an important role in determining gene location, they are assigned arbitrary values according to the researcher's intuition or as estimated by the maximum likelihood principle. Several methods exist by which to evaluate the maximum likelihood estimates of penetrance, although not all of these are supported by software packages and some are biased by marker genotype information, even when disease development is due solely to the genotype of a single allele. FINDINGS: Programs for exploring the maximum likelihood estimates of penetrance parameters were developed using the R statistical programming language supplemented by external C functions. The software returns a vector of polynomial coefficients of penetrance parameters, representing the likelihood of pedigree data. From the likelihood polynomial supplied by the proposed method, the likelihood value and its gradient can be precisely computed. To reduce the effect of the supplied dataset on the likelihood function, feasible parameter constraints can be introduced into maximum likelihood estimates, thus enabling flexible exploration of the penetrance estimates. An auxiliary program generates a perspective plot allowing visual validation of the model's convergence. The functions are collectively available as the MLEP R package. CONCLUSIONS: Linkage analysis using penetrance parameters estimated by the MLEP package enables feasible localization of a disease locus. This is shown through a simulation study and by demonstrating how the package is used to explore maximum likelihood estimates. Although the input dataset tends to bias the likelihood estimates, the method yields accurate results superior to the analysis using intuitive penetrance values for disease with low allele frequencies. MLEP is part of the Comprehensive R Archive Network and is freely available at http://cran.r-project.org/web/packages/MLEP/index.html.  相似文献   

12.
Computer simulation was used to compare minimum variance quadratic estimation (MIVQUE), minimum norm quadratic unbiased estimation (MINQUE), restricted maximum likelihood (REML), maximum likelihood (ML), and Henderson's Method 3 (HM3) on the basis of variance among estimates, mean square error (MSE), bias and probability of nearness for estimation of both individual variance components and three ratios of variance components. The investigation also compared three procedures for dealing with negative estimates and included the use of both individual observations and plot means as the experimental unit of the analysis. The structure of data simulated (field design, mating designs, genetic architecture and imbalance) represented typical analysis problems in quantitative forest genetics. Results of comparing the estimation techniques demonstrated that: estimates of probability of nearness did not discriminate among techniques; bias was discriminatory among procedures for dealing with negative estimates but not among estimation techniques (except ML); sampling variance among estimates was discriminatory among procedures for dealing with negative estimates, estimation techniques and unit of observation; and MSE provided no additional information to variance of the estimates. HM3 and REML were the closest competitors under these criteria; however, REML demonstrated greater robustness to imbalance. Of the three negative estimate procedures, two are of practical significance and guidelines for their application are presented. Estimates from individual observations were always preferable to those from plot means over the experimental levels of this study.This is Journal Series NO. R-03768 of the Institute of Food and Agricultural Sciences  相似文献   

13.
For the analysis of enzyme kinetics, a variety of programs exists. These programs apply either algebraic or dynamic parameter estimation, requiring different approaches for data fitting. The choice of approach and computer program is usually subjective, and it is generally assumed that this choice has no influence on the obtained parameter estimates. However, this assumption has not yet been verified comprehensively. Therefore, in this study, five computer programs for progress curve analysis were compared with respect to accuracy and minimum data amount required to obtain accurate parameter estimates. While two of these five computer programs (MS‐Excel, Origin) use algebraic parameter estimation, three computer programs (Encora, ModelMaker, gPROMS) are able to perform dynamic parameter estimation. For this comparison, the industrially important enzyme penicillin amidase (EC 3.5.1.11) was studied, and both experimental and in silico data were used. It was shown that significant differences in the estimated parameter values arise by using different computer programs, especially if the number of data points is low. Therefore, deviations between parameter values reported in the literature could simply be caused by the use of different computer programs.  相似文献   

14.
Estimation for an island model where mutation maintains ak-allele neutral polymorphism at a single locus on each island is considered. The likelihood of an observed sample type configuration is obtained by applying a computational algorithm analogous to Griffiths and Tavaré (Theor. Popul. Biol.46(1994), 131–159). This allows the computation of sampling distributions in an island model and investigation of their properties. Given a sample type configuration, the maximum likelihood estimate of the migration parameter is obtained by simulating independently the likelihood at a grid of points and, also, using a surface simulation method. The latter method generates the whole likelihood trajectory in a single application of the simulation program. An estimate of variance of the estimate of the migration parameter is obtained using the likelihood trajectory. A comparison of the maximum likelihood estimates of the gene flow between subpopulations is made with those obtained by using Wright'sFSTstatistic.  相似文献   

15.
Algorithmic details to obtain maximum likelihood estimates of parameters on a large phylogeny are discussed. On a large tree, an efficient approach is to optimize branch lengths one at a time while updating parameters in the substitution model simultaneously. Codon substitution models that allow for variable nonsynonymous/synonymous rate ratios (ω=d N/d S) among sites are used to analyze a data set of human influenza virus type A hemagglutinin (HA) genes. The data set has 349 sequences. Methods for obtaining approximate estimates of branch lengths for codon models are explored, and the estimates are used to test for positive selection and to identify sites under selection. Compared with results obtained from the exact method estimating all parameters by maximum likelihood, the approximate methods produced reliable results. The analysis identified a number of sites in the viral gene under diversifying Darwinian selection and demonstrated the importance of including many sequences in the data in detecting positive selection at individual sites. Received: 25 April 2000 / Accepted: 24 July 2000  相似文献   

16.
Multi-trait (co)variance estimation is an important topic in plant and animal breeding. In this study we compare estimates obtained with restricted maximum likelihood (REML) and Bayesian Gibbs sampling of simulated data and of three traits (diameter, height and branch angle) from a 26-year-old partial diallel progeny test of Scots pine (Pinus sylvestris L.). Based on the results from the simulated data we can conclude that the REML estimates are accurate but the mode of posterior distributions from the Gibbs sampling can be overestimated depending on the level of the heritability. The mean and median of the posteriors were considerably higher than the expected values of the heritabilities. The confidence intervals calculated with the delta method were biased downwardly. The highest probablity density (HPD) interval provides a better interval estimate, but could be slightly biased at the lower level. Similar differences between REML and Gibbs sampling estimates were found for the Scots pine data. We conclude that further simulation studies are needed in order to evaluate the effect of different priors on (co)variance components in the genetic individual model.  相似文献   

17.
The theory of photon count histogram (PCH) analysis describes the distribution of fluorescence fluctuation amplitudes due to populations of fluorophores diffusing through a focused laser beam and provides a rigorous framework through which the brightnesses and concentrations of the fluorophores can be determined. In practice, however, the brightnesses and concentrations of only a few components can be identified. Brightnesses and concentrations are determined by a nonlinear least-squares fit of a theoretical model to the experimental PCH derived from a record of fluorescence intensity fluctuations. The χ2 hypersurface in the neighborhood of the optimum parameter set can have varying degrees of curvature, due to the intrinsic curvature of the model, the specific parameter values of the system under study, and the relative noise in the data. Because of this varying curvature, parameters estimated from the least-squares analysis have varying degrees of uncertainty associated with them. There are several methods for assigning confidence intervals to the parameters, but these methods have different efficacies for PCH data. Here, we evaluate several approaches to confidence interval estimation for PCH data, including asymptotic standard error, likelihood joint-confidence region, likelihood confidence intervals, skew-corrected and accelerated bootstrap (BCa), and Monte Carlo residual resampling methods. We study these with a model two-dimensional membrane system for simplicity, but the principles are applicable as well to fluorophores diffusing in three-dimensional solution. Using simulated fluorescence fluctuation data, we find the BCa method to be particularly well-suited for estimating confidence intervals in PCH analysis, and several other methods to be less so. Using the BCa method and additional simulated fluctuation data, we find that confidence intervals can be reduced dramatically for a specific non-Gaussian beam profile.  相似文献   

18.
We report new body mass estimates for the North American Eocene primate Omomys carteri. These estimates are based on postcranial measurements and a variety of analytical methods, including bivariate regression, multiple regression, and principal components analysis (PCA). All body mass estimation equations show high coefficients of determination (R2), and some equations exhibit low prediction errors in accuracy tests involving extant species of body size similar to O. carteri. Equations derived from PCA-summarized data and multiple regression generally perform better than those based on single variables. The consensus of estimates and their statistics suggests a body mass range of 170–290 g. This range is similar to previous estimates for this species based on first molar area (Gingerich, J Hum Evol 10:345–374, 1981; Conroy, Int J Primatol 8:115–137, 1987). Am J Phys Anthropol 109:41–52, 1999. © 1999 Wiley-Liss, Inc.  相似文献   

19.
Heinze G  Schemper M 《Biometrics》2001,57(1):114-119
The phenomenon of monotone likelihood is observed in the fitting process of a Cox model if the likelihood converges to a finite value while at least one parameter estimate diverges to +/- infinity. Monotone likelihood primarily occurs in small samples with substantial censoring of survival times and several highly predictive covariates. Previous options to deal with monotone likelihood have been unsatisfactory. The solution we suggest is an adaptation of a procedure by Firth (1993, Biometrika 80, 27-38) originally developed to reduce the bias of maximum likelihood estimates. This procedure produces finite parameter estimates by means of penalized maximum likelihood estimation. Corresponding Wald-type tests and confidence intervals are available, but it is shown that penalized likelihood ratio tests and profile penalized likelihood confidence intervals are often preferable. An empirical study of the suggested procedures confirms satisfactory performance of both estimation and inference. The advantage of the procedure over previous options of analysis is finally exemplified in the analysis of a breast cancer study.  相似文献   

20.
Summary Often a binary variable is generated by dichotomizing an underlying continuous variable measured at a specific time point according to a prespecified threshold value. In the event that the underlying continuous measurements are from a longitudinal study, one can use the repeated‐measures model to impute missing data on responder status as a result of subject dropout and apply the logistic regression model on the observed or otherwise imputed responder status. Standard Bayesian multiple imputation techniques ( Rubin, 1987 , in Multiple Imputation for Nonresponse in Surveys) that draw the parameters for the imputation model from the posterior distribution and construct the variance of parameter estimates for the analysis model as a combination of within‐ and between‐imputation variances are found to be conservative. The frequentist multiple imputation approach that fixes the parameters for the imputation model at the maximum likelihood estimates and construct the variance of parameter estimates for the analysis model using the results of Robins and Wang (2000, Biometrika 87, 113–124) is shown to be more efficient. We propose to apply ( Kenward and Roger, 1997 , Biometrics 53, 983–997) degrees of freedom to account for the uncertainty associated with variance–covariance parameter estimates for the repeated measures model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号