首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Gianola D  Heringstad B  Odegaard J 《Genetics》2006,173(4):2247-2255
Finite mixture models are helpful for uncovering heterogeneity due to hidden structure. Quantitative genetics issues of continuous characters having a finite mixture of Gaussian components as statistical distribution are explored in this article. The partition of variance in a mixture, the covariance between relatives under the supposition of an additive genetic model, and the offspring-parent regression are derived. Formulas for assessing the effect of mass selection operating on a mixture are given. Expressions for the genetic and phenotypic correlations between mixture and Gaussian traits and between two mixture traits are presented. It is found that, if there is heterogeneity in a population at the genetic or environmental level, then genetic parameters based on theory treating distributions as homogeneous can lead to misleading interpretations. Some peculiarities of mixture characters are: heritability depends on the mean values of the component distributions, the offspring-parent regression is nonlinear, and genetic or phenotypic correlations cannot be interpreted devoid of the mixture proportions and of the parameters of the distributions mixed.  相似文献   

2.
Probits of mixtures   总被引:2,自引:0,他引:2  
T Lwin  P J Martin 《Biometrics》1989,45(3):721-732
The tolerances of individuals (insects, parasites) in a population have a frequency or probability distribution called a tolerance distribution. Many tolerance distributions in bioassay studies can be the result of a rather heterogeneous population of individuals and can often be modelled as a mixture of a number of standard unimodal distributions. A probit analysis can be generalized to the case where the tolerance distribution is a mixture of location and scale parameter distributions. In this article, the existence and determination of the maximum likelihood estimates are investigated. An expectation-maximization (EM) algorithm for probits of mixtures is developed and it is shown that by application of the EM algorithm, the problem of probits of mixtures can be separated into a series of probits of individual component tolerance distributions.  相似文献   

3.
The stochastic nature of high-throughput screening (HTS) data indicates that information may be gleaned by applying statistical methods to HTS data. A foundation of parametric statistics is the study and elucidation of population distributions, which can be modeled using modern spreadsheet software. The methods and results described here use fundamental concepts of statistical population distributions analyzed using a spreadsheet to provide tools in a developing armamentarium for extracting information from HTS data. Specific examples using two HTS kinase assays are analyzed. The analyses use normal and gamma distributions, which combine to form mixture distributions. HTS data were found to be described well using such mixture distributions, and deconvolution of the mixtures to the constituent gamma and normal parts provided insight into how the assays performed. In particular, the proportion of hits confirmed was predicted from the original HTS data and used to assess screening assay performance. The analyses also provide a method for determining how hit thresholds--values used to separate active from inactive compounds--affect the proportion of compounds verified as active and how the threshold can be chosen to optimize the selection process.  相似文献   

4.
We have examined the statistical requirements for the detection of mixtures of two lognormal distributions in doubly truncated data when the sample size is large. The expectation-maximization algorithm was used for parameter estimation. A bootstrap approach was used to test for a mixture of distributions using the likelihood ratio statistic. Analysis of computer simulated mixtures showed that as the ratio of the difference between the means to the minimum standard deviation increases, the power for detection also increases and the accuracy of parameter estimates improves. These procedures were used to examine the distribution of red blood cell volume in blood samples. Each distribution was doubly truncated to eliminate artifactual frequency counts and tested for best fit to a single lognormal distribution or a mixture of two lognormal distributions. A single population was found in samples obtained from 60 healthy individuals. Two subpopulations of cells were detected in 25 of 27 mixtures of blood prepared in vitro. Analyses of mixtures of blood from 40 patients treated for iron-deficiency anemia showed that subpopulations could be detected in all by 6 weeks after onset of treatment. To determine if two-component mixtures could be detected, distributions were examined from untransfused patients with refractory anemia. In two patients with inherited sideroblastic anemia a mixture of microcytic and normocytic cells was found, while in the third patient a single population of microcytic cells was identified. In two family members previously identified as carriers of inherited sideroblastic anemia, mixtures of microcytic and normocytic subpopulations were found. Twenty-five patients with acquired myelodysplastic anemia were examined. A good fit to a mixture of subpopulations containing abnormal microcytic or macrocytic cells was found in two. We have demonstrated that with large sample sizes, mixtures of distributions can be detected even when distributions appear to be unimodal. These statistical techniques provide a means to characterize and quantify alterations in erythrocyte subpopulations in anemia but could also be applied to any set of grouped, doubly truncated data to test for the presence of a mixture of two lognormal distributions.  相似文献   

5.
We present a model that describes the distribution of recurring times of a disease in presence of covariate effects. After a first occurrence of the disease in an individual, the time intervals between successive cases are supposed to be independent and to be a mixture of two distributions according to the issue of the previous treatment. Both sub‐distributions of the model and the mixture proportion are allowed to involve covariates. Parametric inference is considered and we illustrate the methods with data of a recurrent disease and with simulations, using piecewise constant baseline hazard functions.  相似文献   

6.
Different genes often have different phylogenetic histories. Even within regions having the same phylogenetic history, the mutation rates often vary. We investigate the prospects of phylogenetic reconstruction when all the characters are generated from the same tree topology, but the branch lengths vary (with possibly different tree shapes). Furthering work of Kolaczkowski and Thornton (2004, Nature 431: 980-984) and Chang (1996, Math. Biosci. 134: 189-216), we show examples where maximum likelihood (under a homogeneous model) is an inconsistent estimator of the tree. We then explore the prospects of phylogenetic inference under a heterogeneous model. In some models, there are examples where phylogenetic inference under any method is impossible - despite the fact that there is a common tree topology. In particular, there are nonidentifiable mixture distributions, i.e., multiple topologies generate identical mixture distributions. We address which evolutionary models have nonidentifiable mixture distributions and prove that the following duality theorem holds for most DNA substitution models. The model has either: (i) nonidentifiability - two different tree topologies can produce identical mixture distributions, and hence distinguishing between the two topologies is impossible; or (ii) linear tests - there exist linear tests which identify the common tree topology for character data generated by a mixture distribution. The theorem holds for models whose transition matrices can be parameterized by open sets, which includes most of the popular models, such as Tamura-Nei and Kimura's 2-parameter model. The duality theorem relies on our notion of linear tests, which are related to Lake's linear invariants.  相似文献   

7.
This investigation tested whether distributions of certain aspects of eating behavior were consistent with the notion of a “mixture model;” that is, two or more distinct Commingled component distributions, consistent with the possibility of major gene action. Undergraduates (n=901) completed self-report trait measures of hunger, disinhibition, and dietary restraint. Variables were residualized for gender and age and transformed to remove skewness. Residualized transformed distributions were tested for departure from unimodality with Hartigan's (14) dip statistic. The distributions of all three aspects of eating behavior were significantly non-unimodal. Next, component multivariate normal distributions were estimated via maximum likelihood. Likelihood ratio tests were employed to compare nested models. A mixture of four distributions with unequal variance-covariance matrices tit significantly better than any more parsimonious model. In sum, these data strongly suggest that the distributions of several measures of eating behavior are composed of four component distributions. This finding is consistent with the possibility of major gene effects for eating behavior.  相似文献   

8.
It is proposed that the orientation of elongate objects, such as bones, may be used to identify the flow direction of ancient river deposits. If true, elongate objects could be of great value when ancient bedforms such as ripples and dunes are not visible. Two sandstone quarries were investigated wherein the paleoflow direction was determined from both bedforms and elongate dinosaur bones. A mixture of two von Mises distributions captures the observation that elongate bones transported under unidirectional flow conditions will align both parallel and perpendicular to the flow direction. Likelihood ratio tests for a mixture of two von Mises distributions are given. The power of these tests is investigated by simulation since the direction of dinosaur bones agrees with the primary bedforms if the hypothesis test comparing the dominant mean direction of the bones to the paleoflow direction fails to reject. The likelihood ratio test on the dominant mean direction has reasonable power. If the two mean directions in the mixture distribution are pi apart, a more powerful likelihood ratio test can be used. The likelihood ratio test on the hypothesis that the two mean directions are exactly pi apart is useful in determining if the assumptions of the more powerful test are satisfied.  相似文献   

9.
Near-natural multilayered Abies alba Mill.–Fagus sylvatica L. forests form structural mosaics and consist of patches in different developmental stages and phases. Knowledge of the diversity of patch structure and adequate methods to describe the diameter structure is essential for modeling forest dynamics. The hypotheses tested in the study are that near-natural multilayered stands are structurally heterogeneous (i.e., tree diameter (DBH) distributions of these stands are heterogeneous) and, that in these forests the finite-mixture models are suitable for modeling the empirical DBH distributions. Diversity of patch structure was studied based on data collected from 33 sample plots. In multilayered stands, four groups of empirical DBH distributions were distinguished using hierarchical cluster analysis (HCA) and correspondence analysis (CA). Stands investigated are structurally heterogeneous; 27% multilayered stands showed the slightly rotated sigmoid (SRS) type of empirical DBH distribution, 34% the distinctly rotated sigmoid (DRS) type, 18% the bimodal M-shaped (BMS) type, and 21% the unimodal highly skewed (UHS) type. The gamma distribution, the two-component mixture gamma model, and the two-component mixture Weibull model were more flexible for the SRS type of DBH distributions. The average p-values (Chi-square test) for these theoretical distributions were 0.4712, 0.4718, and 0.4660, respectively. The two-component mixture gamma model and the two-component mixture Weibull model were a good choice for modeling the DRS, BMS, and UHS types of DBH distributions. The average p-values (Chi-square test) for these models ranged from 0.2684 to 0.4854. In near-natural multilayered AbiesFagus forest patches of different DBH distributions occur together. The empirical DBH distributions in these stands are characterized by irregular and complicated shapes and therefore are best approximated by finite-mixture models.  相似文献   

10.
Site occupancy models with heterogeneous detection probabilities   总被引:1,自引:0,他引:1  
Royle JA 《Biometrics》2006,62(1):97-102
Models for estimating the probability of occurrence of a species in the presence of imperfect detection are important in many ecological disciplines. In these "site occupancy" models, the possibility of heterogeneity in detection probabilities among sites must be considered because variation in abundance (and other factors) among sampled sites induces variation in detection probability (p). In this article, I develop occurrence probability models that allow for heterogeneous detection probabilities by considering several common classes of mixture distributions for p. For any mixing distribution, the likelihood has the general form of a zero-inflated binomial mixture for which inference based upon integrated likelihood is straightforward. A recent paper by Link demonstrates that in closed population models used for estimating population size, different classes of mixture distributions are indistinguishable from data, yet can produce very different inferences about population size. I demonstrate that this problem can also arise in models for estimating site occupancy in the presence of heterogeneous detection probabilities. The implications of this are discussed in the context of an application to avian survey data and the development of animal monitoring programs.  相似文献   

11.
Matrix population models are a standard tool for studying stage‐structured populations, but they are not flexible in describing stage duration distributions. This study describes a method for modeling various such distributions in matrix models. The method uses a mixture of two negative binomial distributions (parametrized using a maximum likelihood method) to approximate a target (true) distribution. To examine the performance of the method, populations consisting of two life stages (juvenile and adult) were considered. The juvenile duration distribution followed a gamma distribution, lognormal distribution, or zero‐truncated (over‐dispersed) Poisson distribution, each of which represents a target distribution to be approximated by a mixture distribution. The true population growth rate based on a target distribution was obtained using an individual‐based model, and the extent to which matrix models can approximate the target dynamics was examined. The results show that the method generally works well for the examined target distributions, but is prone to biased predictions under some conditions. In addition, the method works uniformly better than an existing method whose performance was also examined for comparison. Other details regarding parameter estimation and model development are also discussed.  相似文献   

12.
Statistical analysis of data on the longest living humans leaves room for speculation whether the human force of mortality is actually leveling off. Based on this uncertainty, we study a mixture failure model, introduced by Finkelstein and Esaulova (2006) that generalizes, among others, the proportional hazards and accelerated failure time models. In this paper we first, extend the Abelian theorem of these authors to mixing distributions, whose densities are functions of regular variation. In addition, taking into account the asymptotic behavior of the mixture hazard rate prescribed by this Abelian theorem, we prove three Tauberian-type theorems that describe the class of admissible mixing distributions. We illustrate our findings with examples of popular mixing distributions that are used to model unobserved heterogeneity.  相似文献   

13.
Finite mixtures of Gaussian distributions are known to provide an accurate approximation to any unknown density. Motivated by DNA repair studies in which data are collected for samples of cells from different individuals, we propose a class of hierarchically weighted finite mixture models. The modeling framework incorporates a collection of k Gaussian basis distributions, with the individual-specific response densities expressed as mixtures of these bases. To allow heterogeneity among individuals and predictor effects, we model the mixture weights, while treating the basis distributions as unknown but common to all distributions. This results in a flexible hierarchical model for samples of distributions. We consider analysis of variance-type structures and a parsimonious latent factor representation, which leads to simplified inferences on non-Gaussian covariance structures. Methods for posterior computation are developed, and the model is used to select genetic predictors of baseline DNA damage, susceptibility to induced damage, and rate of repair.  相似文献   

14.
We prove that the generalized Poisson distribution GP(theta, eta) (eta > or = 0) is a mixture of Poisson distributions; this is a new property for a distribution which is the topic of the book by Consul (1989). Because we find that the fits to count data of the generalized Poisson and negative binomial distributions are often similar, to understand their differences, we compare the probability mass functions and skewnesses of the generalized Poisson and negative binomial distributions with the first two moments fixed. They have slight differences in many situations, but their zero-inflated distributions, with masses at zero, means and variances fixed, can differ more. These probabilistic comparisons are helpful in selecting a better fitting distribution for modelling count data with long right tails. Through a real example of count data with large zero fraction, we illustrate how the generalized Poisson and negative binomial distributions as well as their zero-inflated distributions can be discriminated.  相似文献   

15.
Fitting mixture models to grouped and truncated data via the EM algorithm   总被引:3,自引:0,他引:3  
The fitting of finite mixture models via the EM algorithm is considered for data which are available only in grouped form and which may also be truncated. A practical example is presented where a mixture of two doubly truncated log-normal distributions is adopted to model the distribution of the volume of red blood cells in cows during recovery from anemia.  相似文献   

16.
When multiple strategies can be used to solve a type of problem, the observed response time distributions are often mixtures of multiple underlying base distributions each representing one of these strategies. For the case of two possible strategies, the observed response time distributions obey the fixed-point property. That is, there exists one reaction time that has the same probability of being observed irrespective of the actual mixture proportion of each strategy. In this paper we discuss how to compute this fixed-point, and how to statistically assess the probability that indeed the observed response times are generated by two competing strategies. Accompanying this paper is a free R package that can be used to compute and test the presence or absence of the fixed-point property in response time data, allowing for easy to use tests of strategic behavior.  相似文献   

17.
Summary Fetal growth restriction is a leading cause of perinatal morbidity and mortality that could be reduced if high‐risk infants are identified early in pregnancy. We propose a Bayesian model for aggregating 18 longitudinal ultrasound measurements of fetal size and blood flow into three underlying, continuous latent factors. Our procedure is more flexible than typical latent variable methods in that we relax the normality assumptions by allowing the latent factors to follow finite mixture distributions. Using mixture distributions also permits us to cluster individuals with similar observed characteristics and identify latent classes of subjects who are more likely to be growth or blood flow restricted during pregnancy. We also use our latent variable mixture distribution model to identify a clinically meaningful latent class of subjects with low birth weight and early gestational age. We then examine the association of latent classes of intrauterine growth restriction with latent classes of birth outcomes as well as observed maternal covariates including fetal gender and maternal race, parity, body mass index, and height. Our methods identified a latent class of subjects who have increased blood flow restriction and below average intrauterine size during pregnancy. These subjects were more likely to be growth restricted at birth than a class of individuals with typical size and blood flow.  相似文献   

18.
An elevated level of erythrocyte sodium-lithium (Na-Li) countertransport has been suggested as a predictor of predisposition to essential hypertension. In order to evaluate whether a single genetic or environmental factor with large effects explains the mixture of distributions in Na-Li countertransport in the general population, complex segregation analyses were conducted by using 1,273 individuals more than age 20 years from 276 pedigrees selected without respect to disease risk factors or health status. Either a single genetic locus or a single environmental factor with large gender-specific effects explained the mixture of distributions for Na-Li countertransport in this sample equally well. In the subsample of pedigrees supporting a single-locus etiology, the single genetic locus explained 29.0% of the variability in adjusted Na-Li countertransport in males and 16.6% of that in females. In a subsample of pedigrees supporting an environmental factor etiology, the environmental factor explained 35.2% of the adjusted Na-Li countertransport in males and 20.5% of that in females. These results suggest that there are at least two different explanations for the mixture of distributions in Na-Li countertransport in the general population. Attempts to relate genetic variation in Na-Li countertransport to risk of essential hypertension must consider that the factor with large phenotypic effects on this trait is gender specific and may not be a single major locus in all pedigrees.  相似文献   

19.
Tail moments in the single cell gel electrophoresis (comet) assay usually do not follow a normal distribution, making the statistical analysis complicated. Researchers have used a wide variety of statistical techniques in an attempt to overcome this problem. In many cases, the tail moments follow a bimodal distribution that can be modeled with a mixture of gamma distributions. This bimodality may be due to cells being in two different stages of the cell cycle at the time of treatment. Maximum likelihood, modified to accommodate censored data, can be used to estimate the five parameters of the gamma mixture distribution for each slide. A weighted analysis of variance on the parameter estimates for the gamma mixtures can be performed to determine differences in DNA damage between treatments. These methods were applied to an experiment on the effect of thymidine kinase in DNA damage and repair. Analysis based on the mixture of gamma distributions was found to be more statistically valid, more powerful, and more informative than analysis based on log-transformed tail moments.  相似文献   

20.
A method is proposed that aims at identifying clusters of individuals that show similar patterns when observed repeatedly. We consider linear‐mixed models that are widely used for the modeling of longitudinal data. In contrast to the classical assumption of a normal distribution for the random effects a finite mixture of normal distributions is assumed. Typically, the number of mixture components is unknown and has to be chosen, ideally by data driven tools. For this purpose, an EM algorithm‐based approach is considered that uses a penalized normal mixture as random effects distribution. The penalty term shrinks the pairwise distances of cluster centers based on the group lasso and the fused lasso method. The effect is that individuals with similar time trends are merged into the same cluster. The strength of regularization is determined by one penalization parameter. For finding the optimal penalization parameter a new model choice criterion is proposed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号