首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Hua Chen  Kun Chen 《Genetics》2013,194(3):721-736
The distributions of coalescence times and ancestral lineage numbers play an essential role in coalescent modeling and ancestral inference. Both exact distributions of coalescence times and ancestral lineage numbers are expressed as the sum of alternating series, and the terms in the series become numerically intractable for large samples. More computationally attractive are their asymptotic distributions, which were derived in Griffiths (1984) for populations with constant size. In this article, we derive the asymptotic distributions of coalescence times and ancestral lineage numbers for populations with temporally varying size. For a sample of size n, denote by Tm the mth coalescent time, when m + 1 lineages coalesce into m lineages, and An(t) the number of ancestral lineages at time t back from the current generation. Similar to the results in Griffiths (1984), the number of ancestral lineages, An(t), and the coalescence times, Tm, are asymptotically normal, with the mean and variance of these distributions depending on the population size function, N(t). At the very early stage of the coalescent, when t → 0, the number of coalesced lineages nAn(t) follows a Poisson distribution, and as mn, n(n ? 1)Tm/2N(0) follows a gamma distribution. We demonstrate the accuracy of the asymptotic approximations by comparing to both exact distributions and coalescent simulations. Several applications of the theoretical results are also shown: deriving statistics related to the properties of gene genealogies, such as the time to the most recent common ancestor (TMRCA) and the total branch length (TBL) of the genealogy, and deriving the allele frequency spectrum for large genealogies. With the advent of genomic-level sequencing data for large samples, the asymptotic distributions are expected to have wide applications in theoretical and methodological development for population genetic inference.  相似文献   

2.
We have compared two statistical methods of estimating the time to most recent common ancestor (TMRCA) from a sample of DNA sequences, which have been proposed by Templeton (1993) and Bandeltet al. (1995). Monte-Carlo simulations were used for generating DNA sequence data. Different evolutionary scenarios were simulated and the estimation procedures were evaluated. We have found that for both methods (i) the estimates are insensitive to demographic parameters and (ii) the standard deviations of the estimates are too high for these methods to be reliably used in practice.  相似文献   

3.
A 2.4-kb stretch within the RRM2P4 region of the X chromosome, previously sequenced in a sample of 41 globally distributed humans, displayed both an ancient time to the most recent common ancestor (e.g., a TMRCA of approximately 2 million years) and a basal clade composed entirely of Asian sequences. This pattern was interpreted to reflect a history of introgressive hybridization from archaic hominins (most likely Asian Homo erectus) into the anatomically modern human genome. Here, we address this hypothesis by resequencing the 2.4-kb RRM2P4 region in 131 African and 122 non-African individuals and by extending the length of sequence in a window of 16.5 kb encompassing the RRM2P4 pseudogene in a subset of 90 individuals. We find that both the ancient TMRCA and the skew in non-African representation in one of the basal clades are essentially limited to the central 2.4-kb region. We define a new summary statistic called the minimum clade proportion (pmc), which quantifies the proportion of individuals from a specified geographic region in each of the two basal clades of a binary gene tree, and then employ coalescent simulations to assess the likelihood of the observed central RRM2P4 genealogy under two alternative views of human evolutionary history: recent African replacement (RAR) and archaic admixture (AA). A molecular-clock-based TMRCA estimate of 2.33 million years is a statistical outlier under the RAR model; however, the large variance associated with this estimate makes it difficult to distinguish the predictions of the human origins models tested here. The pmc summary statistic, which has improved power with larger samples of chromosomes, yields values that are significantly unlikely under the RAR model and fit expectations better under a range of archaic admixture scenarios.  相似文献   

4.
Estimation of epidemiological and population parameters from molecular sequence data has become central to the understanding of infectious disease dynamics. Various models have been proposed to infer details of the dynamics that describe epidemic progression. These include inference approaches derived from Kingman’s coalescent theory. Here, we use recently described coalescent theory for epidemic dynamics to develop stochastic and deterministic coalescent susceptible–infected–removed (SIR) tree priors. We implement these in a Bayesian phylogenetic inference framework to permit joint estimation of SIR epidemic parameters and the sample genealogy. We assess the performance of the two coalescent models and also juxtapose results obtained with a recently published birth–death-sampling model for epidemic inference. Comparisons are made by analyzing sets of genealogies simulated under precisely known epidemiological parameters. Additionally, we analyze influenza A (H1N1) sequence data sampled in the Canterbury region of New Zealand and HIV-1 sequence data obtained from known United Kingdom infection clusters. We show that both coalescent SIR models are effective at estimating epidemiological parameters from data with large fundamental reproductive number R0 and large population size S0. Furthermore, we find that the stochastic variant generally outperforms its deterministic counterpart in terms of error, bias, and highest posterior density coverage, particularly for smaller R0 and S0. However, each of these inference models is shown to have undesirable properties in certain circumstances, especially for epidemic outbreaks with R0 close to one or with small effective susceptible populations.  相似文献   

5.
An ancestral influence graph is derived, an analogue of the coalescent and a composite of Griffiths' (1991) two-locus ancestral graph and Krone and Neuhauser's (1997) ancestral selection graph. This generalizes their use of branching-coalescing random graphs so as to incorporate both selection and recombination into gene genealogies. Qualitative understanding of a ‘hitch-hiking’ effect on genealogies is pursued via diagrammatic representation of the genealogical process in a two-locus, two-allele haploid model. Extending the simulation technique of Griffiths and Tavaré (1996), computational estimation of expected times to the most recent common ancestor of samples of n genes under recombination and selection in two-locus, two-allele haploid and diploid models are presented. Such times are conditional on sample configuration. Monte Carlo simulations show that ‘hitch-hiking’ is a subtle effect that alters the conditional expected depth of the genealogy at the linked neutral locus depending on a mutation-selection-recombination balance. Received: 21 July 2000 / Published online: 5 December 2000  相似文献   

6.
Climatic changes during the quaternary history in Arctic regions have shaped the genetic variation and genealogies of Arctic species. Several studies have been conducted in recent years on genetic diversity of Arctic organisms, but marine fishes are largely underrepresented in these studies. Here, we present a study on mitochondrial variation in three Arctic gadoids: Arctic cod (Arctogadus glacialis), Greenland cod (Gadus ogac), and Polar cod (Boreogadus saida). In addition, geographic variation in Polar cod is presented. The sequence variation at the mtDNA presents similar patterns as observed for other related marine fishes. Variation in these three species reflects rather different historic processes, due to colonization and climatic changes than differences in life histories. In Polar cod, a deeper genealogy is observed and variation is dependent on both latitude and longitude. The deep genealogy indicates either admixture of separate lineages or a population, which has been stable in size during alternating cold and warm periods of the pleistocene. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

7.
Brendan O’Fallon 《Genetics》2013,194(2):485-492
The extent to which selective forces shape patterns of genetic and genealogical variation is unknown in many species. Recent theoretical models have suggested that even relatively weak purifying selection may produce significant distortions in gene genealogies, but few studies have sought to quantify this effect in humans. Here, we employ a reconstruction method based on the ancestral recombination graph to infer genealogies across the length of the human X chromosome and to examine time to most recent common ancestor (TMRCA) and measures of tree imbalance at both broad and very fine scales. In agreement with theory, TMRCA is significantly reduced and genealogies are significantly more imbalanced in coding regions and introns when compared to intergenic regions, and these effects are increased in areas of greater evolutionary constraint. These distortions are present at multiple scales, and chromosomal regions as broad as 5 Mb show a significant negative correlation in TMRCA with exon density. We also show that areas of recent TMRCA are significantly associated with the disease-causing potential of site as measured by the MutationTaster prediction algorithm. Together, these findings suggest that purifying selection has significantly distorted human genealogical structure on both broad and fine scales and that few chromosomal regions escape selection-induced distortions.  相似文献   

8.
The central questions asked in whole-genome association studies are how to locate associated regions in the genome and how to estimate the significance of these findings. Researchers usually do this by testing each SNP separately for association and then applying a suitable correction for multiple-hypothesis testing. However, SNPs are correlated by the unobserved genealogy of the population, and a more powerful statistical methodology would attempt to take this genealogy into account. Leveraging the genealogy in association studies is challenging, however, because the inference of the genealogy from the genotypes is a computationally intensive task, in particular when recombination is modeled, as in ancestral recombination graphs. Furthermore, if large numbers of genealogies are imputed from the genotypes, the power of the study might decrease if these imputed genealogies create an additional multiple-hypothesis testing burden. Indeed, we show in this paper that several existing methods that aim to address this problem suffer either from low power or from a very high false-positive rate; their performance is generally not better than the standard approach of separate testing of SNPs. We suggest a new genealogy-based approach, CAMP (coalescent-based association mapping), that takes into account the trade-off between the complexity of the genealogy and the power lost due to the additional multiple hypotheses. Our experiments show that CAMP yields a significant increase in power relative to that of previous methods and that it can more accurately locate the associated region.  相似文献   

9.
An issue often encountered in statistical genetics is whether, or to what extent, it is possible to estimate the degree to which individuals sampled from a background population are related to each other, on the basis of the available genotype data and some information on the demography of the population. In this article, we consider this question using explicit modelling of the pedigrees and gene flows at unlinked marker loci, but then restricting ourselves to a relatively recent history of the population, that is, considering the genealogy at most some tens of generations backwards in time. As a computational tool we use a Markov chain Monte Carlo numerical integration on the state space of genealogies of the sampled individuals. As illustrations of the method, we consider the question of relatedness at the level of genes/genomes (IBD estimation), using both simulated and real data.  相似文献   

10.
The dogma of strict maternal inheritance of mitochondria is now being tested with population genetics methods on sequence data from many species. In this study we investigated whether recombination occurs in the mitochondria of the blue tit (Parus caeruleus) by studying polymorphisms in the mitochondrial control region and in a recently identified (A)n microsatellite on the W chromosome. The female heterogamety of avian sex chromosomes allows a test of whether mitochondrial recombination affects genealogical inference by comparison of mitochondrial and W-linked sequence variation. There is no discrepancy between mitochondrial and W-linked genealogies in blue tits, consistent with no recombination. We also analyzed mitochondrial sequence variation in both blue tits and peregrine falcons (Falco peregrinus) using a coalescent-based approach which accounts for recurrent mutation; in neither bird species did we find evidence of recombination. We conclude that it is unlikely that mitochondrial recombination has large effects on mitochondrial genetic variability in birds.  相似文献   

11.
 It is shown how the mean ancestral times at one locus are affected in a two- locus model with recombination when information is given regarding the number of segregating sites at another locus. For samples of n genes, recursive equations are derived that describe precisely the evolution of the time-depth of such a linked genealogy. Exact numerical solutions and Markov chain Monte Carlo simulations are discussed and compared. The dependence of some properties of a singleton mutation on waiting times between events in the two-locus genealogy is quantified and illustrates the effect of recombination on these properties. The following cases are presented: (1) the distribution of the number of mutant genes in a sample arising from a singleton mutation; (2) the probability that an allele observed in a genes of a sample of size n is the ancestral type (the oldest); (3) the expectation and variance of the age of a mutant having b copies in a sample of n genes. Received: 1 September 2000 / Revised version: 1 October 2001 / Published online: 8 May 2002  相似文献   

12.
The immunologically important major histocompatibility complex (MHC) harbors some of the most polymorphic genes in vertebrates. These genes presumably evolve under parasite‐mediated selection and frequently show inconsistent allelic genealogies, where some alleles are more similar between species than within species. This phenomenon is thought to arise either from convergent evolution under parallel selection or from the preservation of ancient allelic lineages beyond speciation events (trans‐species polymorphism, TSP). Here, we examine natural populations of two sympatric stickleback species (Gasterosteus aculeatus and Pungitius pungitius) to investigate the contribution of these two mechanisms to the evolution of inconsistent allelic genealogies at the MHC. Overlapping parasite taxa between the two host species in three different habitats suggest contemporary parallel selection on the MHC genes. Accordingly, we detected a lack of species‐specific phylogenetic clustering in the immunologically relevant antigen‐binding residues of the MHC IIB genes which contrasted with the rest of the coding and noncoding sequence. However, clustering was not habitat‐specific and a codon‐usage analysis revealed patterns of similarity by descent. In this light, common descent via TSP, in combination with intraspecies gene conversion, rather than convergent evolution is the more strongly supported scenario for the inconsistent genealogy at the MHC.  相似文献   

13.
We introduce the mid-depth method, a practical approach for testing hypotheses of demographic history using genealogies reconstructed from sequence data. The relative positions of internal nodes within a genealogy contain information about past population dynamics. We explain how this information can be used to (1) test the null hypothesis of constant population size and (2) estimate the growth rate and current population size of an exponentially growing population. Simulation tests indicate that, as expected, estimates of exponential growth rates are sometimes biased. The mid-depth method is computationally rapid and does not require knowledge of the sample's mutation rate. However, it does assume that the reconstructed genealogy is correct and is therefore best suited to the analysis of variation-rich viral data sets. When applied to HIV-1 sequence data, the mid-depth method provides phylogenetic evidence of different exponential growth rates for subtypes A and B. We posit that this difference in growth rate reflects the different transmission routes and epidemiological histories of the two subtypes.  相似文献   

14.
A number of polytripeptides related to collagen, namely, (Gly-Pro-Pro)n, (Gly-Pro-Hyp)n, (Gly-Hyp-Hyp)n, (Gly-Pro-Ala)n, (Gly-Pro-Leu)n, (Gly-Pro-Gly)n,(Gly-Ala-Pro)n, (Gly-Ala-Hyp)n, (Ala-Pro-Pro)n, and (Ala-Hyp-Hyp)n were investigated by the method of ir spectroscopy and hydrogen-deuterium kinetics. Strength and order of interpeptide hydrogen bonds of the polytripeptides in a triple-helical conformation were found to depend on the amino acid composition and residue sequence in the triplets. Correlation of X-ray diffraction and spectroscopic data for (Gly-Pro-Hyp)n showed that the increase of the helix parameter in the process of dehydration is accompanied with the weakening of interpeptide hydrogen bonds. Influences of bound water on the length and order of interchain hydrogen bonding was also examined. It was shown that the incorporation of water molecules into the triple helix depends on the amino acid composition and residue sequence. Synthetic models and native collagens were compared.  相似文献   

15.
Terminal ends of vertebrate chromosomes are protected by tandem repeats of the sequence (TTAGGG). First thought to be vertebrate specific, (TTAGGG) n has recently been identified in several aquatic invertebrates including sea urchin (Strongylocentrotus purpuratus), bay scallop (Argopecten irradians), and wedgeshell clam (Donax trunculus). We analyzed genomic DNA from scleractinian corals, Acropora surculosa, Favia pallida, Leptoria phrygia, and Goniastrea retiformis to determine the telomere sequence. Southern blot analysis suggests the presence of the vertebrate telomere repeats in all four species. Treatment of A. surculosa sperm DNA with Bal31 exonuclease revealed progressive shortening of the DNA fragments positive for the (TTAGGG)22 sequence, supporting location of the repeats at the chromosome ends. The presence of the vertebrate telomere repeats in corals is evidence that the (TTAGGG) n sequence is highly conserved among a divergent group of vertebrate and invertebrate species. Corals are members of the Lower Metazoans, the group of organisms that span the gap between the fungi and higher metazoans. Corals are the most basal organism reported to have the (TTAGGG) n sequence to date, which suggests that the vertebrate telomere sequence may be much older than previously thought and that corals may share a number of genes with their higher relatives.  相似文献   

16.
17.
A nearly universal feature of intron sequences is that even closely related species exhibit a large number of insertion/deletion differences. The goal of the analysis described here is to test whether the observed pattern of insertion/deletion events in the genealogy of the myosin alkali light chain (Mlc1) gene is consistent with neutrality, and if not, to determine the underlying forces of evolutionary change. Mlc1 pre-mRNA is alternatively spliced, and one constraint is that signals necessary for tissue-specificity of directed splicing must be conserved. If the total length of an intron is functionally constrained, then the distribution of indels on branches of the gene genealogy should reflect a departure from randomness. Here we perform a phylogenetic analysis, inferring ancestral states wherever possible on a phylogeny of 29 alleles of Mlc1 from six species of Drosophila. Observed patterns of indels on the genealogy were compared to those from simulated data, with the result that we cannot reject the null hypothesis of neutrality. A clear departure from a neutral prediction was seen in the excess folding free energy predicted for the introns flanking the alternatively spliced exon. Relative rate tests also suggest a retardation in the rate of Mlc1 sequence evolution in the simulans clade.   相似文献   

18.
Four major problems can affect the efficiency of methods developed to estimate relatedness between individuals from information of molecular markers: (i) some of them are dependent on the knowledge of the true allelic frequencies in the base population; (ii) they assume that all loci are unlinked and in Hardy-Weinberg and linkage equilibrium; (iii) pairwise methods can lead to incongruous assignations because they take into account only two individuals at a time; (iv) most are usually constructed for particular structured populations (only consider a few relationship classes, e.g. full-sibs vs. unrelated). We have developed a new approach to estimate relatedness that is free from the above limitations. The method uses a 'blind search algorithm' (actually simulated annealing) to find the genealogy that yield a co-ancestry matrix with the highest correlation with the molecular co-ancestry matrix calculated using the markers. Thus (i and ii) it makes no direct assumptions about allelic frequencies or Hardy-Weinberg and linkage equilibrium; (iii) it always provide congruent relationships, as it considers all individuals at a time; (iv) degrees of relatedness can be as complex as desired just increasing the 'depth' (i.e. number of generations) of the proposed genealogies. Computer simulations have shown that the accuracy and robustness against genotyping errors of this new approach is comparable to that of other proposed methods in those particular situations they were developed for, but it is more flexible and can cope with more complex situations.  相似文献   

19.
The genome of the fungal chickpea pathogen Ascochyta rabiei was screened for polymorphisms by microsatellite-primed PCR. While ethidium-bromide staining of electrophoretically separated amplification products showed only limited polymorphism among 24 Tunisian A. rabiei isolates, Southern hybridization of purified PCR fragments to restriction digests of fungal DNA revealed polymorphic DNA fingerprints. One particular probe that gave rise to a hypervariable single-locus hybridization signal was cloned from the Syrian isolate AA6 and sequenced. It contained a large compound microsatellite harbouring the penta- and decameric repeat units (CATTT)n, (CATTA)n, (CATATCATTT)n and (TATTT)n. We call this locus ArMS1 (Ascochyta rabiei microsatellite 1). Unique flanking sequences were used to design primer pairs for locus- specific microsatellite amplification and direct sequencing of additional ArMS1 alleles from Tunisian and Pakistani isolates. A high level of sequence variation was observed, suggesting that multiple mutational mechanisms have contributed to polymorphism. Hybridization and PCR analyses were performed on the parents and 62 monoascosporic F1 progeny derived from a cross between two different mating types of the fungus. Progeny alleles could be traced back to the parents, with one notable exception, where a longer than expected fragment was observed. Direct sequencing of this new length allele revealed an alteration in the copy number of the TATTT repeat [(TATTT)53 to (TATTT)65], while the remainder of the sequence was unchanged. Received: 11 March 1997 / Accepted: 21 June 1997  相似文献   

20.
Double muscling is an inherited condition in cattle characterised by large increases in muscle mass. Mutations in the myostatin (MSTN) gene, responsible for double muscling, were targeted in this study to estimate the time since the most recent common ancestor (TMRCA) for Q204X (p.Gln204*), E226X (p.Glu226*), 821del11 (c.821del11), E291X (p.Glu291*), C313Y (p.Cys313Tyr) and the more phenotypically moderate F94L (p.Phe94Leu) mutation. Genetic variability was examined in eight regions upstream and downstream of the MSTN locus. The molecular distance of the homozygous region associated with each MSTN allele was used to estimate the TMRCA. Long homozygous segments were associated with the MSTN alleles (mostly > 2 Mb), compared to short segments (130 kb) for cattle wild type at the double muscling and F94L sites. Estimates of time indicated that each MSTN allele had a recent common ancestor (<400 years ago). The results from this study, and the increasing frequency of these MSTN alleles in some cattle breeds, demonstrate recent positive selection.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号