首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The ability of the site-frequency spectrum (SFS) to reflect the particularities of gene genealogies exhibiting multiple mergers of ancestral lines as opposed to those obtained in the presence of population growth is our focus. An excess of singletons is a well-known characteristic of both population growth and multiple mergers. Other aspects of the SFS, in particular, the weight of the right tail, are, however, affected in specific ways by the two model classes. Using an approximate likelihood method and minimum-distance statistics, our estimates of statistical power indicate that exponential and algebraic growth can indeed be distinguished from multiple-merger coalescents, even for moderate sample sizes, if the number of segregating sites is high enough. A normalized version of the SFS (nSFS) is also used as a summary statistic in an approximate Bayesian computation (ABC) approach. The results give further positive evidence as to the general eligibility of the SFS to distinguish between the different histories.  相似文献   

2.
Current methods for detecting fluctuating selection require time series data on genotype frequencies. Here, we propose an alternative approach that makes use of DNA polymorphism data from a sample of individuals collected at a single point in time. Our method uses classical diffusion approximations to model temporal fluctuations in the selection coefficients to find the expected distribution of mutation frequencies in the population. Using the Poisson random-field setting we derive the site-frequency spectrum (SFS) for three different models of fluctuating selection. We find that the general effect of fluctuating selection is to produce a more "U"-shaped site-frequency spectrum with an excess of high-frequency derived mutations at the expense of middle-frequency variants. We present likelihood-ratio tests, comparing the fluctuating selection models to the neutral model using SFS data, and use Monte Carlo simulations to assess their power. We find that we have sufficient power to reject a neutral hypothesis using samples on the order of a few hundred SNPs and a sample size of approximately 20 and power to distinguish between selection that varies in time and constant selection for a sample of size 20. We also find that fluctuating selection increases the probability of fixation of selected sites even if, on average, there is no difference in selection among a pair of alleles segregating at the locus. Fluctuating selection will, therefore, lead to an increase in the ratio of divergence to polymorphism similar to that observed under positive directional selection.  相似文献   

3.
《IRBM》2008,29(1):13-19
Raman spectroscopy is a useful tool to investigate the molecular composition of biological samples. Source separation methods can be used to efficiently separate dense information recorded by Raman spectra. Distorting effects such as fluorescence background, peak misalignment or peak width heterogeneity break the linear generative model required by the source separation methods. Preprocessing steps are needed to compensate these deforming effects and make recorded Raman spectra fit the linear generative model of source separation methods. We show in this paper how efficiency of source separation methods is deeply dependent on preprocessing steps applied to raw dataset. Resulting improvements are illustrated through the study of the numerical dewaxing of Raman signal of a human skin biopsy. The applied source separation methods are a classical independent component analysis (ICA) algorithm named joint approximate diagonalization of eigenmatrices (JADE), and two positive source separation methods called non-negative matrix factorization (NMF) and maximum likelihood positive source separation (MLPSS).  相似文献   

4.
Whole-genome regression methods are being increasingly used for the analysis and prediction of complex traits and diseases. In human genetics, these methods are commonly used for inferences about genetic parameters, such as the amount of genetic variance among individuals or the proportion of phenotypic variance that can be explained by regression on molecular markers. This is so even though some of the assumptions commonly adopted for data analysis are at odds with important quantitative genetic concepts. In this article we develop theory that leads to a precise definition of parameters arising in high dimensional genomic regressions; we focus on the so-called genomic heritability: the proportion of variance of a trait that can be explained (in the population) by a linear regression on a set of markers. We propose a definition of this parameter that is framed within the classical quantitative genetics theory and show that the genomic heritability and the trait heritability parameters are equal only when all causal variants are typed. Further, we discuss how the genomic variance and genomic heritability, defined as quantitative genetic parameters, relate to parameters of statistical models commonly used for inferences, and indicate potential inferential problems that are assessed further using simulations. When a large proportion of the markers used in the analysis are in LE with QTL the likelihood function can be misspecified. This can induce a sizable finite-sample bias and, possibly, lack of consistency of likelihood (or Bayesian) estimates. This situation can be encountered if the individuals in the sample are distantly related and linkage disequilibrium spans over short regions. This bias does not negate the use of whole-genome regression models as predictive machines; however, our results indicate that caution is needed when using marker-based regressions for inferences about population parameters such as the genomic heritability.  相似文献   

5.
As methods of molecular phylogeny have become more explicit and more biologically realistic following the pioneering work of Thomas Jukes, they have had to relax their initial assumption that rates of evolution were equal at all sites. Distance matrix and likelihood methods of inferring phylogenies make this assumption; parsimony, when valid, is less limited by it. Nucleotide sequences, including RNA sequences, can show substantial rate variation; protein sequences show rates that vary much more widely. Assuming a prior distribution of rates such as a gamma distribution or lognormal distribution has deservedly been popular, but for likelihood methods it leads to computational difficulties. These can be resolved using hidden Markov model (HMM) methods which approximate the distribution by one with a modest number of discrete rates. Generalized Laguerre quadrature can be used to improve the selection of rates and their probabilities so as to more nearly approach the desired gamma distribution. A model based on population genetics is presented predicting how the rates of evolution might vary from locus to locus. Challenges for the future include allowing rates at a given site to vary along the tree, as in the ``covarion' model, and allowing them to have correlations that reflect three-dimensional structure, rather than position in the coding sequence. Markov chain Monte Carlo likelihood methods may be the only practical way to carry out computations for these models. Received: 8 February 2001 / Accepted: 20 May 2001  相似文献   

6.
Vonesh EF  Chinchilli VM  Pu K 《Biometrics》1996,52(2):572-587
In recent years, generalized linear and nonlinear mixed-effects models have proved to be powerful tools for the analysis of unbalanced longitudinal data. To date, much of the work has focused on various methods for estimating and comparing the parameters of mixed-effects models. Very little work has been done in the area of model selection and goodness-of-fit, particularly with respect to the assumed variance-covariance structure. In this paper, we present a goodness-of-fit statistic which can be used in a manner similar to the R2 criterion in linear regression for assessing the adequacy of an assumed mean and variance-covariance structure. In addition, we introduce an approximate pseudo-likelihood ratio test for testing the adequacy of the hypothesized convariance structure. These methods are illustrated and compared to the usual normal theory likelihood methods (Akaike's information criterion and the likelihood ratio test) using three examples. Simulation results indicate the pseudo-likelihood ratio test compares favorably with the standard normal theory likelihood ratio test, but both procedures are sensitive to departures from normality.  相似文献   

7.
Assessing influence in regression analysis with censored data.   总被引:14,自引:0,他引:14  
L A Escobar  W Q Meeker 《Biometrics》1992,48(2):507-528
In this paper we show how to evaluate the effect that perturbations to the model, data, or case weights have on maximum likelihood estimates from censored survival data. The ideas and methods also apply to other nonlinear estimation problems. We review the ideas behind using log-likelihood displacement and local influence methods. We describe new interpretations for some local influence statistics and show how these statistics extend and complement traditional case deletion influence statistics for linear least squares. These statistics identify individual and combinations of cases that have important influence on estimates of parameters and functions of these parameters. We illustrate the methods by reanalyzing the Stanford Heart Transplant data with a parametric regression model.  相似文献   

8.
Statistical methods to map quantitative trait loci (QTL) in outbred populations are reviewed, extensions and applications to human and plant genetic data are indicated, and areas for further research are identified. Simple and computationally inexpensive methods include (multiple) linear regression of phenotype on marker genotypes and regression of squared phenotypic differences among relative pairs on estimated proportions of identity-by-descent at a locus. These methods are less suited for genetic parameter estimation in outbred populations but allow the determination of test statistic distributions via simulation or data permutation; however, further inferences including confidence intervals of QTL location require the use of Monte Carlo or bootstrap sampling techniques. A method which is intermediate in computational requirements is residual maximum likelihood (REML) with a covariance matrix of random QTL effects conditional on information from multiple linked markers. Testing for the number of QTLs on a chromosome is difficult in a classical framework. The computationally most demanding methods are maximum likelihood and Bayesian analysis, which take account of the distribution of multilocus marker-QTL genotypes on a pedigree and permit investigators to fit different models of variation at the QTL. The Bayesian analysis includes the number of QTLs on a chromosome as an unknown.  相似文献   

9.
In observational studies with dichotomous outcome of a population, researchers usually report treatment effect alone, although both baseline risk and treatment effect are needed to evaluate the significance of the treatment effect to the population. In this article, we study point and interval estimates including confidence region of baseline risk and treatment effect based on logistic model, where baseline risk is the risk of outcome of the population under control treatment while treatment effect is measured by the risk difference between outcomes of the population under active versus control treatments. Using approximate normal distribution of the maximum‐likelihood (ML) estimate of the model parameters, we obtain an approximate joint distribution of the ML estimate of the baseline risk and the treatment effect. Using the approximate joint distribution, we obtain point estimate and confidence region of the baseline risk and the treatment effect as well as point estimate and confidence interval of the treatment effect when the ML estimate of the baseline risk falls into specified range. These interval estimates reflect nonnormality of the joint distribution of the ML estimate of the baseline risk and the treatment effect. The method can be easily implemented by using any software that generates normal distribution. The method can also be used to obtain point and interval estimates of baseline risk and any other measure of treatment effect such as risk ratio and the number needed to treat. The method can also be extended from logistic model to other models such as log‐linear model.  相似文献   

10.
We present a conditional likelihood approach for testing linkage disequilibrium in nuclear families having multiple affected offspring. The likelihood, conditioned on the identity-by-descent (IBD) structure of the sibling genotypes, is unaffected by familial correlation in disease status that arises from linkage between a marker locus and the unobserved trait locus. Two such conditional likelihoods are compared: one that conditions on IBD and phase of the transmitted alleles and a second which conditions only on IBD of the transmitted alleles. Under the log-additive model, the first likelihood is equivalent to the allele-counting methods proposed in the literature. The second likelihood is valid under the added assumption of equal male and female recombination fractions. In a simulation study, we demonstrated that in sibships having two or three affected siblings the score test from each likelihood had the correct test size for testing disequilibrium. They also led to equivalent power to detect linkage disequilibrium at the 5% significance level.  相似文献   

11.
Mapping quantitative trait loci using molecular marker linkage maps   总被引:6,自引:0,他引:6  
Summary High-density restriction fragment length polymorphism (RFLP) and allozyme linkage maps have been developed in several plant species. These maps make it technically feasible to map quantitative trait loci (QTL) using methods based on flanking marker genetic models. In this paper, we describe flanking marker models for doubled haploid (DH), recombinant inbred (RI), backcross (BC), F1 testcross (F1TC), DH testcross (DHTC), recombinant inbred testcross (RITC), F2, and F3 progeny. These models are functions of the means of quantitative trait locus genotypes and recombination frequencies between marker and quantitative trait loci. In addition to the genetic models, we describe maximum likelihood methods for estimating these parameters using linear, nonlinear, and univariate or multivariate normal distribution mixture models. We defined recombination frequency estimators for backcross and F2 progeny group genetic models using the parameters of linear models. In addition, we found a genetically unbiased estimator of the QTL heterozygote mean using a linear function of marker means. In nonlinear models, recombination frequencies are estimated less efficiently than the means of quantitative trait locus genotypes. Recombination frequency estimation efficiency decreases as the distance between markers decreases, because the number of progeny in recombinant marker classes decreases. Mean estimation efficiency is nearly equal for these methods.  相似文献   

12.
Composite likelihood methods have become very popular for the analysis of large-scale genomic data sets because of the computational intractability of the basic coalescent process and its generalizations: It is virtually impossible to calculate the likelihood of an observed data set spanning a large chromosomal region without using approximate or heuristic methods. Composite likelihood methods are approximate methods and, in the present article, assume the likelihood is written as a product of likelihoods, one for each of a number of smaller regions that together make up the whole region from which data is collected. A very general framework for neutral coalescent models is presented and discussed. The framework comprises many of the most popular coalescent models that are currently used for analysis of genetic data. Assume data is collected from a series of consecutive regions of equal size. Then it is shown that the observed data forms a stationary, ergodic process. General conditions are given under which the maximum composite estimator of the parameters describing the model (e.g. mutation rates, demographic parameters and the recombination rate) is a consistent estimator as the number of regions tends to infinity.  相似文献   

13.
Summary .   We examine two issues of importance in nutritional epidemiology: the relationship between dietary fat intake and breast cancer, and the comparison of different dietary assessment instruments, in our case the food frequency questionnaire (FFQ) and the multiple-day food record (FR). The data we use come from women participants in the control group of the Dietary Modification component of the Women's Health Initiative (WHI) Clinical Trial. The difficulty with the analysis of this important data set is that it comes from a truncated sample, namely those women for whom fat intake as measured by the FFQ amounted to 32% or more of total calories. We describe methods that allow estimation of logistic regression parameters in such samples, and also allow comparison of different dietary instruments. Because likelihood approaches that specify the full multivariate distribution can be difficult to implement, we develop approximate methods for both our main problems that are simple to compute and have high efficiency. Application of these approximate methods to the WHI study reveals statistically significant fat and breast cancer relationships when a FR is the instrument used, and demonstrate a marginally significant advantage of the FR over the FFQ in the local power to detect such relationships.  相似文献   

14.
Genetic diversity was studied in six subpopulations (a total of 60 individuals) of wild barley, Hordeum spontaneum , the progenitor of cultivated barley, sampled from six stations located along a transect of 300 m across the two opposing slopes of 'Evolution Canyon', a Mediterranean microsite at Lower Nahal Oren, Mt Carmel. The two opposing slopes are separated by between 100 and 400 m and designated SFS (South-Facing Slope) and NFS (North-Facing Slope) with each having three equidistant test stations. The SFS, which receives up to 300% more solar radiation, is drier, ecologically more heterogeneous, fluctuating, and more stressful than the NFS. Analysis of 12 RAPD primers, representing a total of 51 putative loci, revealed a significant inter- and intraslope variation in RAPD band polymorphism. A significantly higher proportion of polymorphic RAPD loci was found amongst the subpopulations on the SFS (mean P = 0.909) than on the NFS (mean P = 0.682), on the basis of the presence and absence of 22 strong bands. Polymorphism generally increased upwards from the bottom to the top of the SFS (0.636, 0.773, 0.955) and NFS (0.409, 0.500, 0.545), respectively. Gametic phase disequilibria estimates, D, revealed SFS and NFS unique predominant combinations which sharply differentiated the two slopes and indicated that there is differential interslope selection favouring slope-specific multilocus combinations of alleles, or blocks of genes over tens to hundreds of meters. This suggests that selection overrides migration. RAPD polymorphism appears to parallel allozyme diversity which is climatically adaptive and driven by natural selection in the same subpopulations at the microsite.  相似文献   

15.
One of the central problems in mathematical genetics is the inference of evolutionary parameters of a population (such as the mutation rate) based on the observed genetic types in a finite DNA sample. If the population model under consideration is in the domain of attraction of the classical Fleming-Viot process, such as the Wright-Fisher- or the Moran model, then the standard means to describe its genealogy is Kingman's coalescent. For this coalescent process, powerful inference methods are well-established. An important feature of the above class of models is, roughly speaking, that the number of offspring of each individual is small when compared to the total population size, and hence all ancestral collisions are binary only. Recently, more general population models have been studied, in particular in the domain of attraction of so-called generalised Lambda-Fleming-Viot processes, as well as their (dual) genealogies, given by the so-called Lambda-coalescents, which allow multiple collisions. Moreover, Eldon and Wakeley (Genetics 172:2621-2633, 2006) provide evidence that such more general coalescents might actually be more adequate to describe real populations with extreme reproductive behaviour, in particular many marine species. In this paper, we extend methods of Ethier and Griffiths (Ann Probab 15(2):515-545, 1987) and Griffiths and Tavaré (Theor Pop Biol 46:131-159, 1994a, Stat Sci 9:307-319, 1994b, Philos Trans Roy Soc Lond Ser B 344:403-410, 1994c, Math Biosci 12:77-98, 1995) to obtain a likelihood based inference method for general Lambda-coalescents. In particular, we obtain a method to compute (approximate) likelihood surfaces for the observed type probabilities of a given sample. We argue that within the (vast) family of Lambda-coalescents, the parametrisable sub-family of Beta(2 - alpha, alpha)-coalescents, where alpha in (1, 2], are of particular relevance. We illustrate our method using simulated datasets, thus obtaining maximum-likelihood estimators of mutation and demographic parameters.  相似文献   

16.
Phylogenetic comparative methods (PCMs) have been used to test evolutionary hypotheses at phenotypic levels. The evolutionary modes commonly included in PCMs are Brownian motion (genetic drift) and the Ornstein–Uhlenbeck process (stabilizing selection), whose likelihood functions are mathematically tractable. More complicated models of evolutionary modes, such as branch‐specific directional selection, have not been used because calculations of likelihood and parameter estimates in the maximum‐likelihood framework are not straightforward. To solve this problem, we introduced a population genetics framework into a PCM, and here, we present a flexible and comprehensive framework for estimating evolutionary parameters through simulation‐based likelihood computations. The method does not require analytic likelihood computations, and evolutionary models can be used as long as simulation is possible. Our approach has many advantages: it incorporates different evolutionary modes for phenotypes into phylogeny, it takes intraspecific variation into account, it evaluates full likelihood instead of using summary statistics, and it can be used to estimate ancestral traits. We present a successful application of the method to the evolution of brain size in primates. Our method can be easily implemented in more computationally effective frameworks such as approximate Bayesian computation (ABC), which will enhance the use of computationally intensive methods in the study of phenotypic evolution.  相似文献   

17.
Comparative studies of major histocompatibility complex (MHC) genes across vertebrate species can reveal the evolutionary processes that shape the structure and function of immune regulatory proteins. In this study, we characterized MHC class I sequences from six frog species representing three anuran families (Hylidae, Centrolenidae and Ranidae). Using cDNA from our focal species, we amplified a total of 79 unique sequences spanning exons 2-4 that encode the extracellular domains of the functional alpha chain protein. We compared intra- and interspecific nucleotide and amino-acid divergence, tested for recombination, and identified codon sites under selection by estimating the rate of non-synonymous to synonymous substitutions with multiple codon-based maximum likelihood methods. We determined that positive (diversifying) selection was acting on specific amino-acid sites located within the domains that bind pathogen-derived peptides. We also found significant signals of recombination across the physical distance of the genes. Finally, we determined that all the six species expressed two or three putative classical class I loci, in contrast to the single locus condition of Xenopus laevis. Our results suggest that MHC evolution in anurans is a dynamic process and that variation in numbers of loci and genetic diversity can exist among taxa. Thus, the accumulation of genetic data for more species will be useful in further characterizing the relative importance of processes such as selection, recombination and gene duplication in shaping MHC loci among amphibian lineages.  相似文献   

18.
We revisit the classical population genetics model of a population evolving under multiplicative selection, mutation, and drift. The number of beneficial alleles in a multilocus system can be considered a trait under exponential selection. Equations of motion are derived for the cumulants of the trait distribution in the diffusion limit and under the assumption of linkage equilibrium. Because of the additive nature of cumulants, this reduces to the problem of determining equations of motion for the expected allele distribution cumulants at each locus. The cumulant equations form an infinite dimensional linear system and in an authored appendix Adam Prügel-Bennett provides a closed form expression for these equations. We derive approximate solutions which are shown to describe the dynamics well for a broad range of parameters. In particular, we introduce two approximate analytical solutions: (1) Perturbation theory is used to solve the dynamics for weak selection and arbitrary mutation rate. The resulting expansion for the system's eigenvalues reduces to the known diffusion theory results for the limiting cases with either mutation or selection absent. (2) For low mutation rates we observe a separation of time-scales between the slowest mode and the rest which allows us to develop an approximate analytical solution for the dominant slow mode. The solution is consistent with the perturbation theory result and provides a good approximation for much stronger selection intensities.  相似文献   

19.
Clinical trials with Poisson distributed count data as the primary outcome are common in various medical areas such as relapse counts in multiple sclerosis trials or the number of attacks in trials for the treatment of migraine. In this article, we present approximate sample size formulae for testing noninferiority using asymptotic tests which are based on restricted or unrestricted maximum likelihood estimators of the Poisson rates. The Poisson outcomes are allowed to be observed for unequal follow‐up schemes, and both the situations that the noninferiority margin is expressed in terms of the difference and the ratio are considered. The exact type I error rates and powers of these tests are evaluated and the accuracy of the approximate sample size formulae is examined. The test statistic using the restricted maximum likelihood estimators (for the difference test problem) and the test statistic that is based on the logarithmic transformation and employs the maximum likelihood estimators (for the ratio test problem) show favorable type I error control and can be recommended for practical application. The approximate sample size formulae show high accuracy even for small sample sizes and provide power values identical or close to the aspired ones. The methods are illustrated by a clinical trial example from anesthesia.  相似文献   

20.
The molecular clock provides a powerful way to estimate species divergence times. If information on some species divergence times is available from the fossil or geological record, it can be used to calibrate a phylogeny and estimate divergence times for all nodes in the tree. The Bayesian method provides a natural framework to incorporate different sources of information concerning divergence times, such as information in the fossil and molecular data. Current models of sequence evolution are intractable in a Bayesian setting, and Markov chain Monte Carlo (MCMC) is used to generate the posterior distribution of divergence times and evolutionary rates. This method is computationally expensive, as it involves the repeated calculation of the likelihood function. Here, we explore the use of Taylor expansion to approximate the likelihood during MCMC iteration. The approximation is much faster than conventional likelihood calculation. However, the approximation is expected to be poor when the proposed parameters are far from the likelihood peak. We explore the use of parameter transforms (square root, logarithm, and arcsine) to improve the approximation to the likelihood curve. We found that the new methods, particularly the arcsine-based transform, provided very good approximations under relaxed clock models and also under the global clock model when the global clock is not seriously violated. The approximation is poorer for analysis under the global clock when the global clock is seriously wrong and should thus not be used. The results suggest that the approximate method may be useful for Bayesian dating analysis using large data sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号