首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
The potential for imputed genotypes to enhance an analysis of genetic data depends largely on the accuracy of imputation, which in turn depends on properties of the reference panel of template haplotypes used to perform the imputation. To provide a basis for exploring how properties of the reference panel affect imputation accuracy theoretically rather than with computationally intensive imputation experiments, we introduce a coalescent model that considers imputation accuracy in terms of population-genetic parameters. Our model allows us to investigate sampling designs in the frequently occurring scenario in which imputation targets and templates are sampled from different populations. In particular, we derive expressions for expected imputation accuracy as a function of reference panel size and divergence time between the reference and target populations. We find that a modestly sized "internal" reference panel from the same population as a target haplotype yields, on average, greater imputation accuracy than a larger "external" panel from a different population, even if the divergence time between the two populations is small. The improvement in accuracy for the internal panel increases with increasing divergence time between the target and reference populations. Thus, in humans, our model predicts that imputation accuracy can be improved by generating small population-specific custom reference panels to augment existing collections such as those of the HapMap or 1000 Genomes Projects. Our approach can be extended to understand additional factors that affect imputation accuracy in complex population-genetic settings, and the results can ultimately facilitate improvements in imputation study designs.  相似文献   

2.
Wiuf C  Posada D 《Genetics》2003,164(1):407-417
Recent experimental findings suggest that the assumption of a homogeneous recombination rate along the human genome is too naive. These findings point to block-structured recombination rates; certain regions (called hotspots) are more prone than other regions to recombination. In this report a coalescent model incorporating hotspot or block-structured recombination is developed and investigated analytically as well as by simulation. Our main results can be summarized as follows: (1) The expected number of recombination events is much lower in a model with pure hotspot recombination than in a model with pure homogeneous recombination, (2) hotspots give rise to large variation in recombination rates along the genome as well as in the number of historical recombination events, and (3) the size of a (nonrecombining) block in the hotspot model is likely to be overestimated grossly when estimated from SNP data. The results are discussed with reference to the current debate about block-structured recombination and, in addition, the results are compared to genome-wide variation in recombination rates. A number of new analytical results about the model are derived.  相似文献   

3.
The Kingman coalescent, which has become the foundation for a wide range of theoretical as well as empirical studies, was derived as an approximation of the Wright-Fisher (WF) model. The approximation heavily relies on the assumption that population size is large and sample size is much smaller than the population size. Whether the sample size is too large compared to the population size is rarely questioned in practice when applying statistical methods based on the Kingman coalescent. Since WF model is the most widely used population genetics model for reproduction, it is desirable to develop a coalescent framework for the WF model, which can be used whenever there are concerns about the accuracy of the Kingman coalescent as an approximation. This paper described the exact coalescent theory for the WF model and develops a simulation algorithm, which is then used, together with an analytical approach, to study the properties of the exact coalescent as well as its differences to the Kingman coalescent. We show that the Kingman coalescent differs from the exact coalescent by: (1) shorter waiting time between successive coalescent events; (2) different probability of observing a topological relationship among sequences in a sample; and (3) slightly smaller tree length in the genealogy of a large sample. On the other hand, there is little difference in the age of the most recent common ancestor (MRCA) of the sample. The exact coalescent makes up the longer waiting time between successive coalescent events by having multiple coalescence at the same time. The most significant difference among various summary statistics of a coalescent examined is the sum of lengths of external branches, which can be more than 10% larger for exact coalescent than that for the Kingman coalescent. As a whole, the Kingman coalescent is a remarkably accurate approximation to the exact coalescent for sample and population sizes falling considerably outside the region that was originally anticipated.  相似文献   

4.
The genealogical structure of neutral populations in which reproductive success is highly-skewed has been the subject of many recent studies. Here we derive a coalescent dual process for a related class of continuous-time Moran models with viability selection. In these models, individuals can give birth to multiple offspring whose survival depends on both the parental genotype and the brood size. This extends the dual process construction for a multi-type Moran model with genic selection described in Etheridge and Griffiths (2009). We show that in the limit of infinite population size the non-neutral Moran models converge to a Markov jump process which we call the Λ-Fleming-Viot process with viability selection and we derive a coalescent dual for this process directly from the generator and as a limit from the Moran models. The dual is a branching-coalescing process similar to the Ancestral Selection Graph which follows the typed ancestry of genes backwards in time with real and virtual lineages. As an application, the transition functions of the non-neutral Moran and Λ-coalescent models are expressed as mixtures of the transition functions of the dual process.  相似文献   

5.
The allele frequency spectrum is a series of statistics that describe genetic polymorphism, and is commonly used for inferring population genetic parameters and detecting natural selection. Population genetic theory on the allele frequency spectrum for a single population has been well studied using both coalescent theory and diffusion equations. Recently, the theory was extended to the joint allele frequency spectrum (JAFS) for three populations using diffusion equations and was shown to be very useful in inferring human demographic history. In this paper, I show that the JAFS can be analytically derived with coalescent theory for a basic model of two isolated populations and then extended to multiple populations and various complex scenarios, such as those involving population growth and bottleneck, migration, and positive selection. Simulation study is used to demonstrate the accuracy and applicability of the theoretical model. The coalescent theory-based approach for the JAFS can characterize the demographic history with comprehensive statistical models as the diffusion approach does, and in addition gains several novel advantages: the computational complexity of calculating the JAFS with coalescent theory is reduced, and thus it is feasible to analytically obtain the JAFS for multiple populations; the hitchhiking effect can be efficiently modeled in coalescent theory, enabling the development of methodologies for detecting selection via multi-population polymorphism data. As an alternative to the diffusion approximation approach, the coalescent theory for the JAFS also provides a foundation for population genetic inference with the advent of large-scale genomic polymorphism data.  相似文献   

6.
A coalescent dual process for a multi-type Moran model with genic selection is derived using a generator approach. This leads to an expansion of the transition functions in the Moran model and the Wright–Fisher diffusion process limit in terms of the transition functions for the coalescent dual. A graphical representation of the Moran model (in the spirit of Harris) identifies the dual as a strong dual process following typed lines backwards in time. An application is made to the harmonic measure problem of finding the joint probability distribution of the time to the first loss of an allele from the population and the distribution of the surviving alleles at the time of loss. Our dual process mirrors the Ancestral Selection Graph of [Krone, S. M., Neuhauser, C., 1997. Ancestral processes with selection. Theoret. Popul. Biol. 51, 210–237; Neuhauser, C., Krone, S. M., 1997. The genealogy of samples in models with selection. Genetics 145, 519–534], which allows one to reconstruct the genealogy of a random sample from a population subject to genic selection. In our setting, we follow [Stephens, M., Donnelly, P., 2002. Ancestral inference in population genetics models with selection. Aust. N. Z. J. Stat. 45, 395–430] in assuming that the types of individuals in the sample are known. There are also close links to [Fearnhead, P., 2002. The common ancestor at a nonneutral locus. J. Appl. Probab. 39, 38–54]. However, our methods and applications are quite different. This work can also be thought of as extending a dual process construction in a Wright–Fisher diffusion in [Barbour, A.D., Ethier, S.N., Griffiths, R.C., 2000. A transition function expansion for a diffusion model with selection. Ann. Appl. Probab. 10, 123–162]. The application to the harmonic measure problem extends a construction provided in the setting of a neutral diffusion process model in [Ethier, S.N., Griffiths, R.C., 1991. Harmonic measure for random genetic drift. In: Pinsky, M.A. (Ed.), Diffusion Processes and Related Problems in Analysis, vol. 1. In: Progress in Probability Series, vol. 22, Birkhäuser, Boston, pp. 73–81].  相似文献   

7.
Genetic linkage studies based on pedigree data have limited resolution, because of the relatively small number of segregations. Disequilibrium mapping, which uses population associations to infer the location of a disease mutation, provides one possible strategy for narrowing the candidate region. The coalescent process provides a model for the ancestry of a sample of disease alleles, and recombination events between disease locus and marker may be placed on this ancestral phylogeny. These events define the recombinant classes, the sets of sampled disease copies descending from the meiosis at which a given recombination occurred. We show how Monte Carlo generation of the recombinant classes leads to a linkage likelihood for fine-scale mapping from disease haplotypes. We compare single-marker disequilibrium mapping with interval-disequilibrium mapping and discuss how the approach may be extended to multipoint-disequilibrium mapping. The method and its properties are illustrated with an example of simulated data, constructed to be typical of fine-scale mapping of a rare disease in the Japanese population. The method can take into account known features of population history, such as changing patterns of population growth.  相似文献   

8.

Background  

Several phylogenetic approaches have been developed to estimate species trees from collections of gene trees. However, maximum likelihood approaches for estimating species trees under the coalescent model are limited. Although the likelihood of a species tree under the multispecies coalescent model has already been derived by Rannala and Yang, it can be shown that the maximum likelihood estimate (MLE) of the species tree (topology, branch lengths, and population sizes) from gene trees under this formula does not exist. In this paper, we develop a pseudo-likelihood function of the species tree to obtain maximum pseudo-likelihood estimates (MPE) of species trees, with branch lengths of the species tree in coalescent units.  相似文献   

9.
Innan H 《Genetics》2003,163(2):803-810
The infinite-site model of a small multigene family with two duplicated genes is studied. The expectations of the amounts of nucleotide variation within and between two genes and linkage disequilibrium are obtained, and a coalescent-based method for simulating patterns of polymorphism in a small multigene family is developed. The pattern of DNA variation is much more complicated than that in a single-copy gene, which can be simulated by the standard coalescent. Using the coalescent simulation of duplicated genes, the applicability of statistical tests of neutrality to multigene families is considered.  相似文献   

10.
Chromosomal inversions are common in natural populations and are believed to be involved in many important evolutionary phenomena, including speciation, the evolution of sex chromosomes and local adaptation. While recent advances in sequencing and genotyping methods are leading to rapidly increasing amounts of genome-wide sequence data that reveal interesting patterns of genetic variation within inverted regions, efficient simulation methods to study these patterns are largely missing. In this work, we extend the sequential Markovian coalescent, an approximation to the coalescent with recombination, to include the effects of polymorphic inversions on patterns of recombination. Results show that our algorithm is fast, memory-efficient and accurate, making it feasible to simulate large inversions in large populations for the first time. The SMC algorithm enables studies of patterns of genetic variation (for example, linkage disequilibria) and tests of hypotheses (using simulation-based approaches) that were previously intractable.  相似文献   

11.
Hirschsprung disease (HSCR) is a common genetic disorder characterized by intestinal obstruction secondary to enteric aganglionosis. HSCR demonstrates a complex pattern of inheritance, with the RET proto-oncogene acting as a major gene and with several additional susceptibility loci related to the Ret-signaling pathway or to other developmental programs of neural crest cells. To test how the HSCR phenotype may be affected by the presence of genetic variants, we investigated the role of a single-nucleotide polymorphism (SNP), 2508C-->T (S836S), in exon 14 of the RET gene, characterized by low frequency among patients with HSCR and overrepresentation in individuals affected by sporadic medullary thyroid carcinoma. Typing of several different markers across the RET gene demonstrated that a whole conserved haplotype displayed anomalous distribution and nonrandom segregation in families with HSCR. We provide genetic evidence about a protective role of this low-penetrant haplotype in the pathogenesis of HSCR and demonstrate a possible functional effect linked to RET messenger RNA expression.  相似文献   

12.
13.
Prediction of group patterns in social mammals based on a coalescent model   总被引:1,自引:0,他引:1  
This study describes a statistical model which assumes that mammal group patterns match with groups of genetic relatives. Given a fixed sample size, recursive algorithms for the exact computation of the probability distribution of the number of groups are provided. The recursive algorithms are then incorporated into a statistical likelihood framework which can be used to detect and quantify departure from the null-model by estimating a clustering parameter. The test is then applied to ecological data from social herbivores and carnivores. Our findings support the hypothesis that genetic relatedness is likely to predict group patterns when large mammals have few or no predators.  相似文献   

14.
15.
Structured coalescent processes are derived for the finite island model under a migration mechanism that conserves the subpopulation sizes. The underlying population model is a modified Moran model in which the reproducing individual can have very many offspring with some probability. Convergence to a structured coalescent process results when assuming that migration follows a coalescent timescale which can be much shorter than the usual Wright–Fisher timescale. Three different limit processes are possible depending on the coalescent timescale, two of which allow multiple mergers of ancestral lines. The expected time to most recent common ancestor, and the expected total size of the genealogy, of balanced and unbalanced samples can be very similar, even when migration is low, if the coalescent process allows multiple mergers. The expected total size increases almost linearly with sample size in some cases. The results have implications for inference about genetic population structure.  相似文献   

16.
17.
Two rare GPT phenotypes in a Norwegian family. Evidence of a seventh allele   总被引:1,自引:0,他引:1  
B Olaisen 《Humangenetik》1973,19(3):289-291
  相似文献   

18.
Wilkins JF 《Genetics》2004,168(4):2227-2244
This article presents an analysis of a model of isolation by distance in a continuous, two-dimensional habitat. An approximate expression is derived for the distribution of coalescence times for a pair of sequences sampled from specific locations in a rectangular habitat. Results are qualitatively similar to previous analyses of isolation by distance, but account explicitly for the location of samples relative to the habitat boundaries. A separation-of-timescales approach takes advantage of the fact that the sampling locations affect only the recent coalescent behavior. When the population size is larger than the number of generations required for a lineage to cross the habitat range, the long-term genealogical process is reasonably well described by Kingman's coalescent with time rescaled by the effective population size. This long-term effective population size is affected by the local dispersal behavior as well as the geometry of the habitat. When the population size is smaller than the time required to cross the habitat, deep branches in the genealogy are longer than would be expected under the standard neutral coalescent, similar to the pattern expected for a panmictic population whose population size was larger in the past.  相似文献   

19.
Lohse K  Harrison RJ  Barton NH 《Genetics》2011,189(3):977-987
Analysis of genomic data requires an efficient way to calculate likelihoods across very large numbers of loci. We describe a general method for finding the distribution of genealogies: we allow migration between demes, splitting of demes [as in the isolation-with-migration (IM) model], and recombination between linked loci. These processes are described by a set of linear recursions for the generating function of branch lengths. Under the infinite-sites model, the probability of any configuration of mutations can be found by differentiating this generating function. Such calculations are feasible for small numbers of sampled genomes: as an example, we show how the generating function can be derived explicitly for three genes under the two-deme IM model. This derivation is done automatically, using Mathematica. Given data from a large number of unlinked and nonrecombining blocks of sequence, these results can be used to find maximum-likelihood estimates of model parameters by tabulating the probabilities of all relevant mutational configurations and then multiplying across loci. The feasibility of the method is demonstrated by applying it to simulated data and to a data set previously analyzed by Wang and Hey (2010) consisting of 26,141 loci sampled from Drosophila simulans and D. melanogaster. Our results suggest that such likelihood calculations are scalable to genomic data as long as the numbers of sampled individuals and mutations per sequence block are small.  相似文献   

20.
K Zeng 《Heredity》2013,110(4):363-371
There is increasing evidence that background selection, the effects of the elimination of recurring deleterious mutations by natural selection on variability at linked sites, may be a major factor shaping genome-wide patterns of genetic diversity. To accurately quantify the importance of background selection, it is vital to have computationally efficient models that include essential biological features. To this end, a structured coalescent procedure is used to construct a model of background selection that takes into account the effects of recombination, recent changes in population size and variation in selection coefficients against deleterious mutations across sites. Furthermore, this model allows a flexible organization of selected and neutral sites in the region concerned, and has the ability to generate sequence variability at both selected and neutral sites, allowing the correlation between these two types of sites to be studied. The accuracy of the model is verified by checking against the results of forward simulations. These simulations also reveal several patterns of diversity that are in qualitative agreement with observations reported in recent studies of DNA sequence polymorphisms. These results suggest that the model should be useful for data analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号