期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The likelihood ratio test for the two-component normal mixture problem: power and sample size analysis 总被引：2，自引：0，他引：2

N R Mendell H C Thode S J Finch 《Biometrics》1991,47(3):1143-1148

We find, through simulation and modeling, an approximation to the alternative distribution of the likelihood ratio test for two-component mixtures in which the components have different means but equal variances. We consider the range of mixing proportions from 0.5 through .95. Our simulation results indicate a dependence of power on the mixing proportion when pi less than .2 and pi greater than .80. Our model results indicated that the alternative distribution is approximately noncentral chi-square, possibly with 2 degrees of freedom. Using this model, we estimate a sample of 40 is needed to have 50% power to detect a difference between means equal to 3.0 for mixing proportions between .2 and .8. The sample size increases to 50 when the mixing proportion is .90 (or .1) and 82 when the mixing proportion is .95 (or .05). This paper contains a complete table of sample sizes needed for 50%, 80%, and 90% power. 相似文献

2.

Mixture models, robustness, and the weighted likelihood methodology

Markatou M 《Biometrics》2000,56(2):483-486

Problems associated with the analysis of data from a mixture of distributions include the presence of outliers in the sample, the fact that a component may not be well represented in the data, and the problem of biases that occur when the model is slightly misspecified. We study the performance of weighted likelihood in this context. The method produces estimates with low bias and mean squared error, and it is useful in that it unearths data substructures in the form of multiple roots. This in turn indicates multiple potential mixture model fits due to the presence of more components than originally specified in the model. To compute the weighted likelihood estimates, we use as starting values the method of moment estimates computed on bootstrap subsamples drawn from the data. We address a number of important practical issues involving bootstrap sample size selection, the role of starting values, and the behavior of the roots. The algorithm used to compute the weighted likelihood estimates is competitive with EM, and it is similar to EM when the components are not well separated. Moreover, we propose a new statistical stopping rule for the termination of the algorithm. An example and a small simulation study illustrate the above points. 相似文献

3.

Sampling considerations for garden symphylans (Order: Cephalostigmata) in western Oregon

Umble JR Fisher JR 《Journal of economic entomology》2003,96(3):969-974

Sampling recommendations were developed for a potato bait sampling method used to estimate garden symphylan (Scutigerella immaculata Newport) densities in western Oregon. Sample size requirements were developed using Taylor's power law to describe the relationship between sample means and variances. Developed sampling recommendations performed well at sample sizes of 30 and greater, when validated by resampling a cohort of 40 independent data sets. Sample size requirements for the bait sampling method were 1.5 times greater than the requirements for the soil sampling method over densities from 1 to 20 S. immaculata per sample unit. As S. immaculata densities increased from April to May, sample size requirements decreased by 36% for fixed precision levels. For sampling in April, decreasing the damage threshold from 20, to 10 and five S. immaculata per sample unit, required a 1.6 and 2.5 times greater sample size requirement, respectively, for a fixed precision level (c) appropriate for pest management (c = 0.25). The bait sampling method provides an efficient reliable alternative to the standard soil sampling method used to monitor garden symphylan populations. 相似文献

4.

Detection of two-component mixtures of lognormal distributions in grouped, doubly truncated data: analysis of red blood cell volume distributions

C E McLaren M Wagstaff G M Brittenham A Jacobs 《Biometrics》1991,47(2):607-622

We have examined the statistical requirements for the detection of mixtures of two lognormal distributions in doubly truncated data when the sample size is large. The expectation-maximization algorithm was used for parameter estimation. A bootstrap approach was used to test for a mixture of distributions using the likelihood ratio statistic. Analysis of computer simulated mixtures showed that as the ratio of the difference between the means to the minimum standard deviation increases, the power for detection also increases and the accuracy of parameter estimates improves. These procedures were used to examine the distribution of red blood cell volume in blood samples. Each distribution was doubly truncated to eliminate artifactual frequency counts and tested for best fit to a single lognormal distribution or a mixture of two lognormal distributions. A single population was found in samples obtained from 60 healthy individuals. Two subpopulations of cells were detected in 25 of 27 mixtures of blood prepared in vitro. Analyses of mixtures of blood from 40 patients treated for iron-deficiency anemia showed that subpopulations could be detected in all by 6 weeks after onset of treatment. To determine if two-component mixtures could be detected, distributions were examined from untransfused patients with refractory anemia. In two patients with inherited sideroblastic anemia a mixture of microcytic and normocytic cells was found, while in the third patient a single population of microcytic cells was identified. In two family members previously identified as carriers of inherited sideroblastic anemia, mixtures of microcytic and normocytic subpopulations were found. Twenty-five patients with acquired myelodysplastic anemia were examined. A good fit to a mixture of subpopulations containing abnormal microcytic or macrocytic cells was found in two. We have demonstrated that with large sample sizes, mixtures of distributions can be detected even when distributions appear to be unimodal. These statistical techniques provide a means to characterize and quantify alterations in erythrocyte subpopulations in anemia but could also be applied to any set of grouped, doubly truncated data to test for the presence of a mixture of two lognormal distributions. 相似文献

5.

Some implications of a first-order model of inter-plant competition for the means and variances of complex mixtures

A. J. Wright 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》1982,64(1):91-96

Summary Factorial models commonly used in the analysis of overall and component yields of binary mixtures of genotypes are generalised to include mixtures of any number of components (size, m) and the form of an analysis of variance for fitting such a model to tertiary mixtures is outlined. Such a model contains main effects and interactions up to the mth order, and is specific to the size of mixture so that no equivalence necessarily exists between similar parameter sets for different sized mixtures. Monocultures can be regarded as a special case of the general model.A simple model of intra-and inter-component competition is defined which assumes that plants do not interact in their competitive effects on others, a condition which is equivalent to an absence of second and higher order interactions in statistical analyses of mixtures of any size. Simple scaling tests involving the yields of components or whole mixtures of different sizes can also be used to test the adequacy of the model. This competition model least to a linear relationship between the mean yield of a mixture and the reciprocal of the number of components it contains, and thus allows the prediction of means and other statistical parameters for mixtures of one size from those of others. 相似文献

6.

A measure of sexual dimorphism in populations which are univariate normal mixtures

Ipiña SL Durand AI 《Bulletin of mathematical biology》2000,62(5):925-941

Measures of sexual dimorphism have been used extensively to predict the social organization and ecology of animal and human populations. There is, however, no universally accepted measure of phenotypic differences between the sexes. Most indices of sexual dimorphism fail to incorporate all of the information contained in a random data set. In an attempt to have a better alternative, an index is proposed to measure sexual dimorphism in populations that are distributed according to a probabilistic mixture model with two normal components. The index calculates the overlap between two functions that represent the contribution of each sex in the mixture. In order to assess such an index, sample means, variances and sizes of each sex are needed. As a consequence, the sample information used is greater than that used by other indices that take intrasexual variability into account. By evaluating some examples, our proposed index appears to be a more realistic measure of sexual dimorphism than other measures currently used. 相似文献

7.

Kullback-Leibler Markov chain Monte Carlo--a new algorithm for finite mixture analysis and its application to gene expression data

Tatarinova T Bouck J Schumitzky A 《Journal of bioinformatics and computational biology》2008,6(4):727-746

In this paper, we study Bayesian analysis of nonlinear hierarchical mixture models with a finite but unknown number of components. Our approach is based on Markov chain Monte Carlo (MCMC) methods. One of the applications of our method is directed to the clustering problem in gene expression analysis. From a mathematical and statistical point of view, we discuss the following topics: theoretical and practical convergence problems of the MCMC method; determination of the number of components in the mixture; and computational problems associated with likelihood calculations. In the existing literature, these problems have mainly been addressed in the linear case. One of the main contributions of this paper is developing a method for the nonlinear case. Our approach is based on a combination of methods including Gibbs sampling, random permutation sampling, birth-death MCMC, and Kullback-Leibler distance. 相似文献

8.

Likelihood calculations to evaluate experimental designs to estimate genetic variances

Meyer K 《Heredity》2008,101(3):212-221

Mixed model analyses via restricted maximum likelihood, fitting the so-called animal model, have become standard methodology for the estimation of genetic variances. Models involving multiple genetic variance components, due to different modes of gene action, are readily fitted. It is shown that likelihood-based calculations may provide insight into the quality of the resulting parameter estimates, and are directly applicable to the validation of experimental designs. This is illustrated for the example of a design suggested recently to estimate X-linked genetic variances. In particular, large sample variances and sampling correlations are demonstrated to provide an indication of 'problem' scenarios. Using simulation, it is shown that the profile likelihood function provides more appropriate estimates of confidence intervals than large sample variances. Examination of the likelihood function and its derivatives are recommended as part of the design stage of quantitative genetic experiments. 相似文献

9.

Analysis of covariance with pre-treatment measurements in randomized trials: comparison of equal and unequal slopes

Funatogawa I Funatogawa T 《Biometrical journal. Biometrische Zeitschrift》2011,53(5):810-821

In randomized trials, an analysis of covariance (ANCOVA) is often used to analyze post-treatment measurements with pre-treatment measurements as a covariate to compare two treatment groups. Random allocation guarantees only equal variances of pre-treatment measurements. We hence consider data with unequal covariances and variances of post-treatment measurements without assuming normality. Recently, we showed that the actual type I error rate of the usual ANCOVA assuming equal slopes and equal residual variances is asymptotically at a nominal level under equal sample sizes, and that of the ANCOVA with unequal variances is asymptotically at a nominal level, even under unequal sample sizes. In this paper, we investigated the asymptotic properties of the ANCOVA with unequal slopes for such data. The estimators of the treatment effect at the observed mean are identical between equal and unequal variance assumptions, and these are asymptotically normal estimators for the treatment effect at the true mean. However, the variances of these estimators based on standard formulas are biased, and the actual type I error rates are not at a nominal level, irrespective of variance assumptions. In equal sample sizes, the efficiency of the usual ANCOVA assuming equal slopes and equal variances is asymptotically the same as those of the ANCOVA with unequal slopes and higher than that of the ANCOVA with equal slopes and unequal variances. Therefore, the use of the usual ANCOVA is appropriate in equal sample sizes. 相似文献

10.

Mixture-model classification in DNA content analysis.

Huixia Wang Shuguang Huang 《Cytometry. Part A》2007,71(9):716-723

DNA abundance provides important information about cell physiology and proliferation activity. In a typical in vitro cellular assay, the distribution of the DNA content within a sample is comprised of cell debris, G0/G1-, S-, and G2/M-phase cells. In some circumstances, there may be a collection of cells that contain more than two copies of DNA. The primary focus of DNA content analysis is to deconvolute the overlapping mixtures of the cellular components, and subsequently to investigate whether a given treatment has perturbed the mixing proportions of the sample components. We propose a restricted mixture model that is parameterized to incorporate the available biological information. A likelihood ratio (LR) test is developed to test for changes in the mixing proportions between two cell populations. The proposed mixture model is applied to both simulated and real experimental data. The model fitting is compared with unrestricted models; the statistical inference on proportion change is compared between the proposed LR test and the Kolmogorov-Smirnov test, which is frequently used to test for differences in DNA content distribution. The proposed mixture model outperforms the existing approaches in the estimation of the mixing proportions and gives biologically interpretable results; the proposed LR test demonstrates improved sensitivity and specificity for detecting changes in the mixing proportions. 相似文献

11.

Detecting Specific Populations in Mixtures

Joel Howard Reynolds William David Templin 《Environmental Biology of Fishes》2004,69(1-4):233-243

Mixed stock analysis (MSA) estimates the relative contributions of distinct populations in a mixture of organisms. Increasingly, MSA is used to judge the presence or absence of specific populations in specific mixture samples. This is commonly done by inspecting the bootstrap confidence interval of the contribution of interest. This method has a number of statistical deficiencies, including almost zero power to detect small contributions even if the population has perfect identifiability. We introduce a more powerful method based on the likelihood ratio test and compare both methods in a simulation demonstration using a 17 population baseline of sockeye salmon, Oncorhynchus nerka, from the Kenai River, Alaska, watershed. Power to detect a nonzero contribution will vary with the population(s) identifiability relative to the rest of the baseline, the contribution size, mixture sample size, and analysis method. The demonstration shows that the likelihood ratio method is always more powerful than the bootstrap method, the two methods only being equal when both display 100% power. Power declines for both methods as contribution declines, but it declines faster and goes to zero for the bootstrap method. Power declines quickly for both methods as population identifiability declines, though the likelihood ratio test is able to capitalize on the presence of 'perfect identification' characteristics, such as private alleles. Given the baseline-specific nature of detection power, researchers are encouraged to conduct a priori power analyses similar to the current demonstration when planning their applications. 相似文献

12.

Minimum Hellinger distance estimation for finite mixtures of Poisson regression models and its applications 总被引：1，自引：0，他引：1

Lu Z Hui YV Lee AH 《Biometrics》2003,59(4):1016-1026

Minimum Hellinger distance estimation (MHDE) has been shown to discount anomalous data points in a smooth manner with first-order efficiency for a correctly specified model. An estimation approach is proposed for finite mixtures of Poisson regression models based on MHDE. Evidence from Monte Carlo simulations suggests that MHDE is a viable alternative to the maximum likelihood estimator when the mixture components are not well separated or the model parameters are near zero. Biometrical applications also illustrate the practical usefulness of the MHDE method. 相似文献

13.

Sample Size Determination in Comparing Two Population Variances with Paired‐data: Application to Bilirubin Tests

W. Bogle Y. S. Hsu 《Biometrical journal. Biometrische Zeitschrift》2002,44(5):594-602

Sample size needed by using paired‐data to compare two population variances is much smaller than that of using usual independent samples. In this paper, we proposed three parametric methods to compare two population variances with paired‐data. After comparing the powers of all three methods, we furnish the tables to indicate the least sample sizes required in various parameter values for the method with the largest power. The bilirubin example is used to illustrate the usefulness of the tables. 相似文献

14.

A mixture model-based approach to the clustering of microarray expression data 总被引：13，自引：0，他引：13

McLachlan GJ Bean RW Peel D 《Bioinformatics (Oxford, England)》2002,18(3):413-422

MOTIVATION: This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. RESULTS: The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets. AVAILABILITY: EMMIX-GENE is available at http://www.maths.uq.edu.au/~gjm/emmix-gene/ 相似文献

15.

Simulated percentage points for the null distribution of the likelihood ratio test for a mixture of two normals 总被引：1，自引：0，他引：1

H C Thode S J Finch N R Mendell 《Biometrics》1988,44(4):1195-1201

We find the percentage points of the likelihood ratio test of the null hypothesis that a sample of n observations is from a normal distribution with unknown mean and variance against the alternative that the sample is from a mixture of two distinct normal distributions, each with unknown mean and unknown (but equal) variance. The mixing proportion pi is also unknown under the alternative hypothesis. For 2,500 samples of sizes n = 15, 20, 25, 40, 50, 70, 75, 80, 100, 150, 250, 500, and 1,000, we calculated the likelihood ratio statistic, and from these values estimated the percentage points of the null distributions. Our algorithm for the calculation of the maximum likelihood estimates of the unknown parameters included precautions against convergence of the maximization algorithm to a local rather than global maximum. Investigations for convergence to an asymptotic distribution indicated that convergence was very slow and that stability was not apparent for samples as large as 1,000. Comparisons of the percentage points to the commonly assumed chi-squared distribution with 2 degrees of freedom indicated that this assumption is too liberal; i.e., one's P-value is greater than that indicated by chi 2(2). We conclude then that one would need what is usually an unfeasibly large sample size (n greater than 1,000) for the use of large-sample approximations to be justified. 相似文献

16.

Computation of the Full Likelihood Function for Estimating Variance at a Quantitative Trait Locus 总被引：1，自引：1，他引：0

下载免费PDF全文

S. Xu 《Genetics》1996,144(4):1951-1960

The proportion of alleles identical by descent (IBD) determines the genetic covariance between relatives, and thus is crucial in estimating genetic variances of quantitative trait loci (QTL). However, IBD proportions at QTL are unobservable and must be inferred from marker information. The conventional method of QTL variance analysis maximizes the likelihood function by replacing the missing IBDs by their conditional expectations (the expectation method), while in fact the full likelihood function should take into account the conditional distribution of IBDs (the distribution method). The distribution method for families of more than two sibs has not been obvious because there are n(n - 1)/2 IBD variables in a family of size n, forming an n X n symmetrical matrix. In this paper, I use four binary variables, where each indicates the event that an allele from one of the four grandparents has passed to the individual. The IBD proportion between any two sibs is then expressed as a function of the indicators. Subsequently, the joint distribution of the IBD matrix is derived from the distribution of the indicator variables. Given the joint distribution of the unknown IBDs, a method to compute the full likelihood function is developed for families of arbitrary sizes. 相似文献

17.

Assessment of agreement under nonstandard conditions using regression models for mean and variance

Choudhary PK Tony Ng HK 《Biometrics》2006,62(1):288-296

The total deviation index of Lin and Lin et al. is an intuitive approach for the assessment of agreement between two methods of measurement. It assumes that the differences of the paired measurements are a random sample from a normal distribution and works essentially by constructing a probability content tolerance interval for this distribution. We generalize this approach to the case when differences may not have identical distributions -- a common scenario in applications. In particular, we use the regression approach to model the mean and the variance of differences as functions of observed values of the average of the paired measurements, and describe two methods based on asymptotic theory of maximum likelihood estimators for constructing a simultaneous probability content tolerance band. The first method uses bootstrap to approximate the critical point and the second method is an analytical approximation. Simulation shows that the first method works well for sample sizes as small as 30 and the second method is preferable for large sample sizes. We also extend the methodology for the case when the mean function is modeled using penalized splines via a mixed model representation. Two real data applications are presented. 相似文献

18.

Robust Clustering Using Exponential Power Mixtures

Jian Zhang Faming Liang 《Biometrics》2010,66(4):1078-1086

Summary Clustering is a widely used method in extracting useful information from gene expression data, where unknown correlation structures in genes are believed to persist even after normalization. Such correlation structures pose a great challenge on the conventional clustering methods, such as the Gaussian mixture (GM) model, k‐means (KM), and partitioning around medoids (PAM), which are not robust against general dependence within data. Here we use the exponential power mixture model to increase the robustness of clustering against general dependence and nonnormality of the data. An expectation–conditional maximization algorithm is developed to calculate the maximum likelihood estimators (MLEs) of the unknown parameters in these mixtures. The Bayesian information criterion is then employed to determine the numbers of components of the mixture. The MLEs are shown to be consistent under sparse dependence. Our numerical results indicate that the proposed procedure outperforms GM, KM, and PAM when there are strong correlations or non‐Gaussian components in the data. 相似文献

19.

Parentage analysis with few contributing breeders: validation and improvement

Duchesne P Meldgaard T Berrebi P 《The Journal of heredity》2008,99(3):323-334

Validation of parental allocation using PAPA software (Duchesne P, Godbout MH, Bernatchez L. 2002. PAPA (package for the analysis of parental allocation): a computer program for simulated and real parental allocation. Mol Ecol Notes. 2:191-193.) was investigated under the assumption that only a small proportion of potential breeders contributed to the offspring sample. Inbreeding levels proved to have a large impact on allocation error rate. Consequently, simulations from artificial, unrelated parents may strongly underestimate allocation error, and so, whenever possible, simulations based on the actual parental genotypes should be run. An unexpected and interesting finding was that ambiguity (the highest likelihood is shared by several parental pairs) rates below 10% stood very close to exact allocation error rates (true proportions of wrong allocations). Hence, the ambiguity rate statistic may be viewed as a ready-made indicator of the resolution power of a specific parental allocation run and, if not exceeding 10%, used as an estimate of allocation error rate. It was found that the PAPA simulator, even with few contributing breeders, can be trusted to output reasonably accurate estimates of allocation error as long as those estimates do not exceed 15%. Indeed, most discrepancies between exact and estimated error then stood below 3%. Reproductive success variance had little impact on error estimate discrepancies within the same range. Finally, a (focal set) method was described to correct the estimated family sizes computed directly from parental allocations. Essentially, this method makes use of the detailed structure of the allocation probabilities associated with each parental pair with at least 1 allocated offspring. The allocation probabilities are expressed in matrix form, and the subsequent calculations are run based on standard matrix algebra. On average, this method provided better estimates of family sizes for each investigated combination of parameter values. As the size of offspring samples increased, the corrections improved until a plateau was finally reached. Typically, samples comprising 250, 500, and 1000 offspring would bring corrections in the order of 10-20%, 20-30%, and 30-40%, respectively. 相似文献

20.

Small-sample inference for the comparison of means of log-normal distributions

Gill PS 《Biometrics》2004,60(2):525-527

We propose a likelihood-based test for comparing the means of two or more log-normal distributions, with possibly unequal variances. A modification to the likelihood ratio test is needed when sample sizes are small. The performance of the proposed procedures is compared with the F-ratio test using Monte Carlo simulations. 相似文献