首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The composite-likelihood estimator (CLE) of the population recombination rate considers only sites with exactly two alleles under a finite-sites mutation model (McVean, G. A. T., P. Awadalla, and P. Fearnhead. 2002. A coalescent-based method for detecting and estimating recombination from gene sequences. Genetics 160:1231-1241). While in such a model the identity of alleles is not considered, the CLE has been shown to be robust to minor misspecification of the underlying mutational model. However, there are many situations where the putative mutation and demographic history can be quite complex. One good example is rapidly evolving pathogens, like HIV-1. First we evaluated the performance of the CLE and the likelihood permutation test (LPT) under more complex, realistic models, including a general time reversible (GTR) substitution model, rate heterogeneity among sites (Gamma), positive selection, population growth, population structure, and noncontemporaneous sampling. Second, we relaxed some of the assumptions of the CLE allowing for a four-allele, GTR + Gamma model in an attempt to use the data more efficiently. Through simulations and the analysis of real data, we concluded that the CLE is robust to severe misspecifications of the substitution model, but underestimates the recombination rate in the presence of exponential growth, population mixture, selection, or noncontemporaneous sampling. In such cases, the use of more complex models slightly increases performance in some occasions, especially in the case of the LPT. Thus, our results provide for a more robust application of the estimation of recombination rates.  相似文献   

2.
Heller G  Qin J 《Biometrics》2001,57(3):813-817
We consider the problem of estimation and inference on the mixture parameter in the two-sample problem when sample data from the two distributions as well as from a third population consisting of a mixture of the two are used. Under a general nonparametric model, where the relationship between the two populations is unspecified, we develop a pairwise rank-based likelihood. Simultaneous inference on the mixture proportion and a parameter representing the probability an observation from one population is greater than an observation from the other population is based on this likelihood. Under some regularity conditions, it is shown that the maximum pairwise rank likelihood estimator is consistent and has an asymptotic normal distribution. Simulation results indicate that the performance of this statistic is satisfactory. The methodology is demonstrated on a data set in prostate cancer.  相似文献   

3.
Xue  Liugen; Zhu  Lixing 《Biometrika》2007,94(4):921-937
A semiparametric regression model for longitudinal data is considered.The empirical likelihood method is used to estimate the regressioncoefficients and the baseline function, and to construct confidenceregions and intervals. It is proved that the maximum empiricallikelihood estimator of the regression coefficients achievesasymptotic efficiency and the estimator of the baseline functionattains asymptotic normality when a bias correction is made.Two calibrated empirical likelihood approaches to inferencefor the baseline function are developed. We propose a groupwiseempirical likelihood procedure to handle the inter-series dependencefor the longitudinal semiparametric regression model, and employbias correction to construct the empirical likelihood ratiofunctions for the parameters of interest. This leads us to provea nonparametric version of Wilks' theorem. Compared with methodsbased on normal approximations, the empirical likelihood doesnot require consistent estimators for the asymptotic varianceand bias. A simulation compares the empirical likelihood andnormal-based methods in terms of coverage accuracies and averageareas/lengths of confidence regions/intervals.  相似文献   

4.
Composite likelihood methods have become very popular for the analysis of large-scale genomic data sets because of the computational intractability of the basic coalescent process and its generalizations: It is virtually impossible to calculate the likelihood of an observed data set spanning a large chromosomal region without using approximate or heuristic methods. Composite likelihood methods are approximate methods and, in the present article, assume the likelihood is written as a product of likelihoods, one for each of a number of smaller regions that together make up the whole region from which data is collected. A very general framework for neutral coalescent models is presented and discussed. The framework comprises many of the most popular coalescent models that are currently used for analysis of genetic data. Assume data is collected from a series of consecutive regions of equal size. Then it is shown that the observed data forms a stationary, ergodic process. General conditions are given under which the maximum composite estimator of the parameters describing the model (e.g. mutation rates, demographic parameters and the recombination rate) is a consistent estimator as the number of regions tends to infinity.  相似文献   

5.
Understanding the transmission dynamics of infectious diseases is important for both biological research and public health applications. It has been widely demonstrated that statistical modeling provides a firm basis for inferring relevant epidemiological quantities from incidence and molecular data. However, the complexity of transmission dynamic models presents two challenges: (1) the likelihood function of the models is generally not computable, and computationally intensive simulation-based inference methods need to be employed, and (2) the model may not be fully identifiable from the available data. While the first difficulty can be tackled by computational and algorithmic advances, the second obstacle is more fundamental. Identifiability issues may lead to inferences that are driven more by prior assumptions than by the data themselves. We consider a popular and relatively simple yet analytically intractable model for the spread of tuberculosis based on classical IS6110 fingerprinting data. We report on the identifiability of the model, also presenting some methodological advances regarding the inference. Using likelihood approximations, we show that the reproductive value cannot be identified from the data available and that the posterior distributions obtained in previous work have likely been substantially dominated by the assumed prior distribution. Further, we show that the inferences are influenced by the assumed infectious population size, which generally has been kept fixed in previous work. We demonstrate that the infectious population size can be inferred if the remaining epidemiological parameters are already known with sufficient precision.  相似文献   

6.
MOTIVATION: We hypothesized that recombination rates might be increased at genetic loci that are subject to more intense selection. Here, we test this hypothesis by using a recently published set of accelerated conserved regions and fine-scale recombination rate estimates provided by the HapMap project. RESULTS: We observed that fine-scale recombination rates are increased around conserved noncoding regions that show accelerated evolution in human or chimp, as compared to noncoding regions showing accelerated evolution in mouse and those being conserved between human and fugu. Recombination rates around hominid accelerated conserved regions (ACRs) are furthermore increased as compared to exonic regions. On the other hand, GC-content is reduced around ACRs, excluding a major confounding influence of GC-content on the observed variation in recombination rate. Conclusion: Our observations indicate that selection intensity could be an important determinant of local recombination rate variation and that continued positive selection might act at many ACR loci. Alternatively, a confounding factor needs to be found that causes a congruent signal in recombination rate estimates based on human polymorphism data and in the comparative genomic data. Researchers who consider the explanation involving selection as more likely may expect more common functional sequence variants at ACRs in genetic association studies.  相似文献   

7.
We introduce a new method for detection of recombination hotspots from population genetic data. This method is based on (a) defining an (approximate) penalized likelihood for how recombination rate varies with physical position and (b) maximizing this penalized likelihood over possible sets of recombination hotspots. Simulation results suggest that this is a more powerful method for detection of hotspots than are existing methods. We apply the method to data from 89 genes sequenced in African American and European American populations. We find many genes with multiple hotspots, and some hotspots show evidence of being population-specific. Our results suggest that hotspots are randomly positioned within genes and could be as frequent as one per 30 kb.  相似文献   

8.
This is part 2 of a pair of papers on antimicrobial assays conducted to estimate the log reduction (LR), in the density of viable microbes, attributable to the germicide. Two alternative definitions of LR were defined in part 1, one based on the mean of the log-transformed densities; the other is based on the logarithm of the mean of densities. In this paper, we evaluate statistical methods for estimating LR from an antimicrobial assay in which the responses are presence/absence observations at each dilution in a series of dilutions. We provide a model for the presence/absence data, and, for each definition of LR, we derive the maximum likelihood estimator (mle). Using computer simulation methods, we compare the mle to several alternative estimators, including an estimator based on averaging the log-transformed most probable number (mpn) values. Standard error formulas for the estimators are also derived and evaluated using computer simulations. This investigation results in the following recommendations. If the parameter of interest is based on the mean of log-transformed densities, then the results favor use of the log-transformed mpn method. If, however, the parameter of interest is based on the logarithm of the mean of densities, then the results show that the mle should be used.  相似文献   

9.
Recombination is well known as a complicating factor in the interpretation of molecular phylogenies. Here we describe a maximum likelihood sliding window method based on a likelihood ratio test for scanning DNA sequence alignments for regions of incongruent phylogenetic signals, such as those influenced by recombination. Using this method, we identify several instances of gene conversion between paralogous chaperonin genes in euryarchaeote Archaea, many of which are not detected by two other widely used methods. In the Thermococcus/Pyrococcus lineage, where a gene duplication producing a and b paralogues predates the divergence of Thermococcus strains KS-1 and KS-8, gene conversion has homogenized portions of the a and b genes in KS-8 since the divergence of these two strains. A region near the 3′ end of the a and b paralogues in the methanogen Methanobacterium thermoautotrophicum also appears to have undergone gene conversion. We apply the method to two additional test data sets, the argF gene of Neisseria and a set of actin paralogues in maize, and show that it successfully identifies all the recombinant regions that were previously detected with other methods. Our approach is relatively insensitive to the presence of divergent sequences in the alignment, making it ideal for detecting recombination between both closely and distantly related genes.  相似文献   

10.
11.
Self-fertilization and the evolution of recombination   总被引:1,自引:0,他引:1       下载免费PDF全文
Roze D  Lenormand T 《Genetics》2005,170(2):841-857
In this article, we study the effect of self-fertilization on the evolution of a modifier allele that alters the recombination rate between two selected loci. We consider two different life cycles: under gametophytic selfing, a given proportion of fertilizations involves gametes produced by the same haploid individual, while under sporophytic selfing, a proportion of fertilizations involves gametes produced by the same diploid individual. Under both life cycles, we derive approximations for the change in frequency of the recombination modifier when selection is weak relative to recombination, so that the population reaches a state of quasi-linkage equilibrium. We find that gametophytic selfing increases the range of epistasis under which increased recombination is favored; however, this effect is substantial only for high selfing rates. Moreover, gametophytic selfing affects the relative influence of different components of epistasis (additive x additive, additive x dominance, dominance x dominance) on the evolution of the modifier. Sporophytic selfing has much stronger effects: even a small selfing rate greatly increases the parameter range under which recombination is favored, when there is negative dominance x dominance epistasis. This effect is due to the fact that selfing generates a correlation in homozygosity at linked loci, which is reduced by recombination.  相似文献   

12.
We describe a new approximate likelihood for population genetic data under a model in which a single ancestral population has split into two daughter populations. The approximate likelihood is based on the ‘Product of Approximate Conditionals’ likelihood and ‘copying model’ of Li and Stephens [Li, N., Stephens, M., 2003. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165 (4), 2213–2233]. The approach developed here may be used for efficient approximate likelihood-based analyses of unlinked data. However our copying model also considers the effects of recombination. Hence, a more important application is to loosely-linked haplotype data, for which efficient statistical models explicitly featuring non-equilibrium population structure have so far been unavailable. Thus, in addition to the information in allele frequency differences about the timing of the population split, the method can also extract information from the lengths of haplotypes shared between the populations. There are a number of challenges posed by extracting such information, which makes parameter estimation difficult. We discuss how the approach could be extended to identify haplotypes introduced by migrants.  相似文献   

13.
Summary Gene co‐expressions have been widely used in the analysis of microarray gene expression data. However, the co‐expression patterns between two genes can be mediated by cellular states, as reflected by expression of other genes, single nucleotide polymorphisms, and activity of protein kinases. In this article, we introduce a bivariate conditional normal model for identifying the variables that can mediate the co‐expression patterns between two genes. Based on this model, we introduce a likelihood ratio (LR) test and a penalized likelihood procedure for identifying the mediators that affect gene co‐expression patterns. We propose an efficient computational algorithm based on iterative reweighted least squares and cyclic coordinate descent and have shown that when the tuning parameter in the penalized likelihood is appropriately selected, such a procedure has the oracle property in selecting the variables. We present simulation results to compare with existing methods and show that the LR‐based approach can perform similarly or better than the existing method of liquid association and the penalized likelihood procedure can be quite effective in selecting the mediators. We apply the proposed method to yeast gene expression data in order to identify the kinases or single nucleotide polymorphisms that mediate the co‐expression patterns between genes.  相似文献   

14.
In follow‐up studies, the disease event time can be subject to left truncation and right censoring. Furthermore, medical advancements have made it possible for patients to be cured of certain types of diseases. In this article, we consider a semiparametric mixture cure model for the regression analysis of left‐truncated and right‐censored data. The model combines a logistic regression for the probability of event occurrence with the class of transformation models for the time of occurrence. We investigate two techniques for estimating model parameters. The first approach is based on martingale estimating equations (EEs). The second approach is based on the conditional likelihood function given truncation variables. The asymptotic properties of both proposed estimators are established. Simulation studies indicate that the conditional maximum‐likelihood estimator (cMLE) performs well while the estimator based on EEs is very unstable even though it is shown to be consistent. This is a special and intriguing phenomenon for the EE approach under cure model. We provide insights into this issue and find that the EE approach can be improved significantly by assigning appropriate weights to the censored observations in the EEs. This finding is useful in overcoming the instability of the EE approach in some more complicated situations, where the likelihood approach is not feasible. We illustrate the proposed estimation procedures by analyzing the age at onset of the occiput‐wall distance event for patients with ankylosing spondylitis.  相似文献   

15.
The effect of misclassification of phenotypes of a trait on the estimation of recombination value was investigated. The effect was larger for closer linkage. If a locus is dominant and linked with the misclassfied trait locus in the repulsion phase, then the effect on the recombination value between the two loci is largest. A method for estimating the unbiased recombination value and the misclassification rate using maximum likelihood associated with an EM algorithm is also presented. This method was applied to a numerical example from rice genome data. It was concluded that the present method combined with the metric multi-dimensional scaling method is useful for the detection of misclassified markers and for the estimation of unbiased recombination values.  相似文献   

16.
We examined the effects of recombination on the molecular evolution of noncoding regions in pseudoautosomal regions (PARs) and recombination hotspots in hominoids. The PAR-linked regions analyzed had on average longer branch lengths than those of the recombination hotspots. Moreover, contrary to previous observations, we found no correlation between recombination rate and silent site divergence in our data set and little change in the GC content during recent hominoid evolution. This suggests that the current rate of recombination is not a good indicator of the past rates of recombination for these highly recombining regions. Furthermore, human recombination hotspots show increased AT to GC substitutions in the human lineage, while no such pattern is detected for PAR-linked regions. Together, these observations suggest that recombination hotspots in hominoids are transient in the evolutionary time-scale. Interestingly, the 16p13.3 recombination hotspot locus violates a local molecular clock, though the locus appears to be noncoding and should evolve neutrally. We hypothesize that sudden changes in recombination rate have caused the changes in substitution rate at this locus.  相似文献   

17.
非交叉配子形成体的连锁图谱构建方法   总被引:1,自引:0,他引:1  
根据非交叉(achiasmatic)遗传模型,提出采用最大似然法计算遗传交换率的方法,同时开发了构建非交叉生物(F2群体)连锁图谱的计算机软件。通过卡方验检可测性连锁分子标记。对于无交叉生物现象,采用蒙特卡洛模拟技术,对交叉(chiasmatic)和非交叉两个遗传模型遗传交换率的估计值和作图效率进行了比较。模拟结果表明,非交叉模型能提供无偏的估计值,而交叉模型则只有实际值的一半。在所有同等的条件下,基于非交叉模型的作图效率均高于基于交叉模型(无校正)的作图效率。对于非交叉配子形成体,采用基于非交叉模型的交换率计算方法能获得理想的作图效率。  相似文献   

18.
Kauermann G  Eilers P 《Biometrics》2004,60(2):376-387
An important goal of microarray studies is the detection of genes that show significant changes in expression when two classes of biological samples are being compared. We present an ANOVA-style mixed model with parameters for array normalization, overall level of gene expression, and change of expression between the classes. For the latter we assume a mixing distribution with a probability mass concentrated at zero, representing genes with no changes, and a normal distribution representing the level of change for the other genes. We estimate the parameters by optimizing the marginal likelihood. To make this practical, Laplace approximations and a backfitting algorithm are used. The performance of the model is studied by simulation and by application to publicly available data sets.  相似文献   

19.
Zhang D  Lin X  Sowers M 《Biometrics》2000,56(1):31-39
We consider semiparametric regression for periodic longitudinal data. Parametric fixed effects are used to model the covariate effects and a periodic nonparametric smooth function is used to model the time effect. The within-subject correlation is modeled using subject-specific random effects and a random stochastic process with a periodic variance function. We use maximum penalized likelihood to estimate the regression coefficients and the periodic nonparametric time function, whose estimator is shown to be a periodic cubic smoothing spline. We use restricted maximum likelihood to simultaneously estimate the smoothing parameter and the variance components. We show that all model parameters can be easily obtained by fitting a linear mixed model. A common problem in the analysis of longitudinal data is to compare the time profiles of two groups, e.g., between treatment and placebo. We develop a scaled chi-squared test for the equality of two nonparametric time functions. The proposed model and the test are illustrated by analyzing hormone data collected during two consecutive menstrual cycles and their performance is evaluated through simulations.  相似文献   

20.
Maximum likelihood estimation of the model parameters for a spatial population based on data collected from a survey sample is usually straightforward when sampling and non-response are both non-informative, since the model can then usually be fitted using the available sample data, and no allowance is necessary for the fact that only a part of the population has been observed. Although for many regression models this naive strategy yields consistent estimates, this is not the case for some models, such as spatial auto-regressive models. In this paper, we show that for a broad class of such models, a maximum marginal likelihood approach that uses both sample and population data leads to more efficient estimates since it uses spatial information from sampled as well as non-sampled units. Extensive simulation experiments based on two well-known data sets are used to assess the impact of the spatial sampling design, the auto-correlation parameter and the sample size on the performance of this approach. When compared to some widely used methods that use only sample data, the results from these experiments show that the maximum marginal likelihood approach is much more precise.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号