首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Most modern population genetics inference methods are based on the coalescence framework. Methods that allow estimating parameters of structured populations commonly insert migration events into the genealogies. For these methods the calculation of the coalescence probability density of a genealogy requires a product over all time periods between events. Data sets that contain populations with high rates of gene flow among them require an enormous number of calculations. A new method, transition probability-structured coalescence (TPSC), replaces the discrete migration events with probability statements. Because the speed of calculation is independent of the amount of gene flow, this method allows calculating the coalescence densities efficiently. The current implementation of TPSC uses an approximation simplifying the interaction among lineages. Simulations and coverage comparisons of TPSC vs. MIGRATE show that TPSC allows estimation of high migration rates more precisely, but because of the approximation the estimation of low migration rates is biased. The implementation of TPSC into programs that calculate quantities on phylogenetic tree structures is straightforward, so the TPSC approach will facilitate more general inferences in many computer programs.  相似文献   

2.
Hua Chen  Kun Chen 《Genetics》2013,194(3):721-736
The distributions of coalescence times and ancestral lineage numbers play an essential role in coalescent modeling and ancestral inference. Both exact distributions of coalescence times and ancestral lineage numbers are expressed as the sum of alternating series, and the terms in the series become numerically intractable for large samples. More computationally attractive are their asymptotic distributions, which were derived in Griffiths (1984) for populations with constant size. In this article, we derive the asymptotic distributions of coalescence times and ancestral lineage numbers for populations with temporally varying size. For a sample of size n, denote by Tm the mth coalescent time, when m + 1 lineages coalesce into m lineages, and An(t) the number of ancestral lineages at time t back from the current generation. Similar to the results in Griffiths (1984), the number of ancestral lineages, An(t), and the coalescence times, Tm, are asymptotically normal, with the mean and variance of these distributions depending on the population size function, N(t). At the very early stage of the coalescent, when t → 0, the number of coalesced lineages nAn(t) follows a Poisson distribution, and as mn, n(n ? 1)Tm/2N(0) follows a gamma distribution. We demonstrate the accuracy of the asymptotic approximations by comparing to both exact distributions and coalescent simulations. Several applications of the theoretical results are also shown: deriving statistics related to the properties of gene genealogies, such as the time to the most recent common ancestor (TMRCA) and the total branch length (TBL) of the genealogy, and deriving the allele frequency spectrum for large genealogies. With the advent of genomic-level sequencing data for large samples, the asymptotic distributions are expected to have wide applications in theoretical and methodological development for population genetic inference.  相似文献   

3.
Founder events play a critical role in shaping genetic diversity, fitness and disease risk in a population. Yet our understanding of the prevalence and distribution of founder events in humans and other species remains incomplete, as most existing methods require large sample sizes or phased genomes. Thus, we developed ASCEND that measures the correlation in allele sharing between pairs of individuals across the genome to infer the age and strength of founder events. We show that ASCEND can reliably estimate the parameters of founder events under a range of demographic scenarios. We then apply ASCEND to two species with contrasting evolutionary histories: ~460 worldwide human populations and ~40 modern dog breeds. In humans, we find that over half of the analyzed populations have evidence for recent founder events, associated with geographic isolation, modes of sustenance, or cultural practices such as endogamy. Notably, island populations have lower population sizes than continental groups and most hunter-gatherer, nomadic and indigenous groups have evidence of recent founder events. Many present-day groups––including Native Americans, Oceanians and South Asians––have experienced more extreme founder events than Ashkenazi Jews who have high rates of recessive diseases due their known history of founder events. Using ancient genomes, we show that the strength of founder events differs markedly across geographic regions and time––with three major founder events related to the peopling of Americas and a trend in decreasing strength of founder events in Europe following the Neolithic transition and steppe migrations. In dogs, we estimate extreme founder events in most breeds that occurred in the last 25 generations, concordant with the establishment of many dog breeds during the Victorian times. Our analysis highlights a widespread history of founder events in humans and dogs and elucidates some of the demographic and cultural practices related to these events.  相似文献   

4.
Probabilities of monophyly, paraphyly, and polyphyly of two-species gene genealogies are computed for modest sample sizes and compared for two different Λ coalescent processes. Coalescent processes belonging to the Λ coalescent family admit asynchronous multiple mergers of active ancestral lineages. Assigning a timescale to the time of divergence becomes a central issue when different populations have different coalescent processes running on different timescales. Clade probabilities in single populations are also computed, which can be useful for testing for taxonomic distinctiveness of an observed set of monophyletic lineages. The coalescence rates of multiple merger coalescent processes are functions of coalescent parameters. The effect of coalescent parameters on the probabilities studied depends on the coalescent process, and if the population is ancestral or derived. The probability of reciprocal monophyly tends to be somewhat lower, when associated with a Λ coalescent, under the null hypothesis that two groups come from the same population. However, even for fairly recent divergence times, the probability of monophyly tends to be higher as a function of the number of generations for coalescent processes that admit multiple mergers, and is sensitive to the parameter of one of the example processes.  相似文献   

5.
The Rickettsia genus is a group of obligate intracellular α-proteobacteria representing a paradigm of reductive evolution. Here, we investigate the evolutionary processes that shaped the genomes of the genus. The reconstruction of ancestral genomes indicates that their last common ancestor contained more genes, but already possessed most traits associated with cellular parasitism. The differences in gene repertoires across modern Rickettsia are mainly the result of differential gene losses from the ancestor. We demonstrate using computer simulation that the propensity of loss was variable across genes during this process. We also analyzed the ratio of nonsynonymous to synonymous changes (Ka/Ks) calculated as an average over large sets of genes to assay the strength of selection acting on the genomes of Rickettsia, Anaplasmataceae, and free-living γ-proteobacteria. As a general trend, Ka/Ks were found to decrease with increasing divergence between genomes. The high Ka/Ks for closely related genomes are probably due to a lag in the removal of slightly deleterious nonsynonymous mutations by natural selection. Interestingly, we also observed a decrease of the rate of gene loss with increasing divergence, suggesting a similar lag in the removal of slightly deleterious pseudogene alleles. For larger divergence (Ks > 0.2), Ka/Ks converge toward similar values indicating that the levels of selection are roughly equivalent between intracellular α-proteobacteria and their free-living relatives. This contrasts with the view that obligate endocellular microorganisms tend to evolve faster as a consequence of reduced effectiveness of selection, and suggests a major role of enhanced background mutation rates on the fast protein divergence in the obligate intracellular α-proteobacteria.  相似文献   

6.
A large offspring-number diploid biparental multilocus population model of Moran type is our object of study. At each time step, a pair of diploid individuals drawn uniformly at random contributes offspring to the population. The number of offspring can be large relative to the total population size. Similar “heavily skewed” reproduction mechanisms have been recently considered by various authors (cf. e.g., Eldon and Wakeley 2006, 2008) and reviewed by Hedgecock and Pudovkin (2011). Each diploid parental individual contributes exactly one chromosome to each diploid offspring, and hence ancestral lineages can coalesce only when in distinct individuals. A separation-of-timescales phenomenon is thus observed. A result of Möhle (1998) is extended to obtain convergence of the ancestral process to an ancestral recombination graph necessarily admitting simultaneous multiple mergers of ancestral lineages. The usual ancestral recombination graph is obtained as a special case of our model when the parents contribute only one offspring to the population each time. Due to diploidy and large offspring numbers, novel effects appear. For example, the marginal genealogy at each locus admits simultaneous multiple mergers in up to four groups, and different loci remain substantially correlated even as the recombination rate grows large. Thus, genealogies for loci far apart on the same chromosome remain correlated. Correlation in coalescence times for two loci is derived and shown to be a function of the coalescence parameters of our model. Extending the observations by Eldon and Wakeley (2008), predictions of linkage disequilibrium are shown to be functions of the reproduction parameters of our model, in addition to the recombination rate. Correlations in ratios of coalescence times between loci can be high, even when the recombination rate is high and sample size is large, in large offspring-number populations, as suggested by simulations, hinting at how to distinguish between different population models.  相似文献   

7.
Unraveling widespread polyploidy events throughout plant evolution is a necessity for inferring the impacts of whole-genome duplication (WGD) on speciation, functional innovations, and to guide identification of true orthologs in divergent taxa. Here, we employed an integrated syntenic and phylogenomic analyses to reveal an ancient WGD that shaped the genomes of all commelinid monocots, including grasses, bromeliads, bananas (Musa acuminata), ginger, palms, and other plants of fundamental, agricultural, and/or horticultural interest. First, comprehensive phylogenomic analyses revealed 1421 putative gene families that retained ancient duplication shared by Musa (Zingiberales) and grass (Poales) genomes, indicating an ancient WGD in monocots. Intergenomic synteny blocks of Musa and Oryza were investigated, and 30 blocks were shown to be duplicated before Musa-Oryza divergence an estimated 120 to 150 million years ago. Synteny comparisons of four monocot (rice [Oryza sativa], sorghum [Sorghum bicolor], banana, and oil palm [Elaeis guineensis]) and two eudicot (grape [Vitis vinifera] and sacred lotus [Nelumbo nucifera]) genomes also support this additional WGD in monocots, herein called Tau (τ). Integrating synteny and phylogenomic comparisons achieves better resolution of ancient polyploidy events than either approach individually, a principle that is exemplified in the disambiguation of a WGD series of rho (ρ)-sigma (σ)-tau (τ) in the grass lineages that echoes the alpha (α)-beta (β)-gamma (γ) series previously revealed in the Arabidopsis thaliana lineage.  相似文献   

8.
Molecular studies have led recently to the proposal of a new super-ordinal arrangement of the 18 extant Eutherian orders. From the four proposed super-orders, Afrotheria and Xenarthra were considered the most basal. Chromosome-painting studies with human probes in these two mammalian groups are thus key in the quest to establish the ancestral Eutherian karyotype. Although a reasonable amount of chromosome-painting data with human probes have already been obtained for Afrotheria, no Xenarthra species has been thoroughly analyzed with this approach. We hybridized human chromosome probes to metaphases of species (Dasypus novemcinctus, Tamandua tetradactyla, and Choloepus hoffmanii) representing three of the four Xenarthra families. Our data allowed us to review the current hypotheses for the ancestral Eutherian karyotype, which range from 2n = 44 to 2n = 48. One of the species studied, the two-toed sloth C. hoffmanii (2n = 50), showed a chromosome complement strikingly similar to the proposed 2n = 48 ancestral Eutherian karyotype, strongly reinforcing it.  相似文献   

9.
Paleogenomics is the nascent discipline concerned with sequencing and analysis of genome‐scale information from historic, ancient, and even extinct samples. While once inconceivable due to the challenges of DNA damage, contamination, and the technical limitations of PCR‐based Sanger sequencing, following the dawn of the second‐generation sequencing revolution, it has rapidly become a reality. However, a significant challenge facing ancient DNA studies on extinct species is the lack of closely related reference genomes against which to map the sequencing reads from ancient samples. Although bioinformatic efforts to improve the assemblies have focused mainly in mapping algorithms, in this article we explore the potential of an alternative approach, namely using reconstructed ancestral genome as reference for mapping DNA sequences of ancient samples. Specifically, we present a preliminary proof of concept for a general framework and demonstrate how under certain evolutionary divergence thresholds, considerable mapping improvements can be easily obtained.  相似文献   

10.
Dollo’s law posits that evolutionary losses are irreversible, thereby narrowing the potential paths of evolutionary change. While phenotypic reversals to ancestral states have been observed, little is known about their underlying genetic causes. The genomes of budding yeasts have been shaped by extensive reductive evolution, such as reduced genome sizes and the losses of metabolic capabilities. However, the extent and mechanisms of trait reacquisition after gene loss in yeasts have not been thoroughly studied. Here, through phylogenomic analyses, we reconstructed the evolutionary history of the yeast galactose utilization pathway and observed widespread and repeated losses of the ability to utilize galactose, which occurred concurrently with the losses of GALactose (GAL) utilization genes. Unexpectedly, we detected multiple galactose-utilizing lineages that were deeply embedded within clades that underwent ancient losses of galactose utilization. We show that at least two, and possibly three, lineages reacquired the GAL pathway via yeast-to-yeast horizontal gene transfer. Our results show how trait reacquisition can occur tens of millions of years after an initial loss via horizontal gene transfer from distant relatives. These findings demonstrate that the losses of complex traits and even whole pathways are not always evolutionary dead-ends, highlighting how reversals to ancestral states can occur.  相似文献   

11.
Recently, the study of ancient DNA (aDNA) has been greatly enhanced by the development of second-generation DNA sequencing technologies and targeted enrichment strategies. These developments have allowed the recovery of several complete ancient genomes, a result that would have been considered virtually impossible only a decade ago. Prior to these developments, aDNA research was largely focused on the recovery of short DNA sequences and their use in the study of phylogenetic relationships, molecular rates, species identification and population structure. However, it is now possible to sequence a large number of modern and ancient complete genomes from a single species and thereby study the genomic patterns of evolutionary change over time. Such a study would herald the beginnings of ancient population genomics and its use in the study of evolution. Species that are amenable to such large-scale studies warrant increased research effort. We report here progress on a population genomic study of the Adélie penguin (Pygoscelis adeliae). This species is ideally suited to ancient population genomic research because both modern and ancient samples are abundant in the permafrost conditions of Antarctica. This species will enable us to directly address many of the fundamental questions in ecology and evolution.  相似文献   

12.
The Escherichia coli K-12 genome (ECO) was compared with the sampled genomes of the sibling species Salmonella enterica serovars Typhimurium, Typhi and Paratyphi A (collectively referred to as SAL) and the genome of the close outgroup Klebsiella pneumoniae (KPN). There are at least 160 locations where sequences of >400 bp are absent from ECO but present in the genomes of all three SAL and 394 locations where sequences are present in ECO but close homologs are absent in all SAL genomes. The 394 sequences in ECO that do not occur in SAL contain 1350 (30.6%) of the 4405 ECO genes. Of these, 1165 are missing from both SAL and KPN. Most of the 1165 genes are concentrated within 28 regions of 10–40 kb, which consist almost exclusively of such genes. Among these regions were six that included previously identified cryptic phage. A hypothetical ancestral state of genomic regions that differ between ECO and SAL can be inferred in some cases by reference to the genome structure in KPN and the more distant relative Yersinia pestis. However, many changes between ECO and SAL are concentrated in regions where all four genera have a different structure. The rate of gene insertion and deletion is sufficiently high in these regions that the ancestral state of the ECO/SAL lineage cannot be inferred from the present data. The sequencing of other closely related genomes, such as S.bongori or Citrobacter, may help in this regard.  相似文献   

13.
Genome annotation, assisted by computer programs, is one of the great advances in modern biology. Nevertheless, the in silico identification of small and complex coding sequences is still challenging. We observed that amino acid sequences inferred from coding—but rarely from non-coding—DNA sequences accumulated alignments in low-stringency BLAST searches, suggesting that this alignments accumulation could be used to highlight coding regions in sequenced DNA. To investigate this possibility, we developed a computer program (AnABlast) that generates profiles of accumulated alignments in query amino acid sequences using a low-stringency BLAST strategy. To validate this approach, all six-frame translations of DNA sequences between every two annotated exons of the fission yeast genome were analysed with AnABlast. AnABlast-generated profiles identified three new copies of known genes, and four new genes supported by experimental evidence. New pseudogenes, ancestral carboxyl- and amino-terminal subtractions, complex gene rearrangements, and ancient fragments of mitDNA and of bacterial origin, were also inferred. Thus, this novel in silico approach provides a powerful tool to uncover new genes, as well as fossil-coding sequences, thus providing insight into the evolutionary history of annotated genomes.  相似文献   

14.
Shotgun metagenome sequencing has become a fast, cheap and high-throughput technology for characterizing microbial communities in complex environments and human body sites. However, accurate identification of microorganisms at the strain/species level remains extremely challenging. We present a novel k-mer-based approach, termed GSMer, that identifies genome-specific markers (GSMs) from currently sequenced microbial genomes, which were then used for strain/species-level identification in metagenomes. Using 5390 sequenced microbial genomes, 8 770 321 50-mer strain-specific and 11 736 360 species-specific GSMs were identified for 4088 strains and 2005 species (4933 strains), respectively. The GSMs were first evaluated against mock community metagenomes, recently sequenced genomes and real metagenomes from different body sites, suggesting that the identified GSMs were specific to their targeting genomes. Sensitivity evaluation against synthetic metagenomes with different coverage suggested that 50 GSMs per strain were sufficient to identify most microbial strains with ≥0.25× coverage, and 10% of selected GSMs in a database should be detected for confident positive callings. Application of GSMs identified 45 and 74 microbial strains/species significantly associated with type 2 diabetes patients and obese/lean individuals from corresponding gastrointestinal tract metagenomes, respectively. Our result agreed with previous studies but provided strain-level information. The approach can be directly applied to identify microbial strains/species from raw metagenomes, without the effort of complex data pre-processing.  相似文献   

15.
Hepadnaviridae are double-stranded DNA viruses that infect some species of birds and mammals. This includes humans, where hepatitis B viruses (HBVs) are prevalent pathogens in considerable parts of the global population. Recently, endogenized sequences of HBVs (eHBVs) have been discovered in bird genomes where they constitute direct evidence for the coexistence of these viruses and their hosts from the late Mesozoic until present. Nevertheless, virtually nothing is known about the ancient host range of this virus family in other animals. Here we report the first eHBVs from crocodilian, snake, and turtle genomes, including a turtle eHBV that endogenized >207 million years ago. This genomic “fossil” is >125 million years older than the oldest avian eHBV and provides the first direct evidence that Hepadnaviridae already existed during the Early Mesozoic. This implies that the Mesozoic fossil record of HBV infection spans three of the five major groups of land vertebrates, namely birds, crocodilians, and turtles. We show that the deep phylogenetic relationships of HBVs are largely congruent with the deep phylogeny of their amniote hosts, which suggests an ancient amniote–HBV coexistence and codivergence, at least since the Early Mesozoic. Notably, the organization of overlapping genes as well as the structure of elements involved in viral replication has remained highly conserved among HBVs along that time span, except for the presence of the X gene. We provide multiple lines of evidence that the tumor-promoting X protein of mammalian HBVs lacks a homolog in all other hepadnaviruses and propose a novel scenario for the emergence of X via segmental duplication and overprinting of pre-existing reading frames in the ancestor of mammalian HBVs. Our study reveals an unforeseen host range of prehistoric HBVs and provides novel insights into the genome evolution of hepadnaviruses throughout their long-lasting association with amniote hosts.  相似文献   

16.
17.
How did plant species emerge from their most recent common ancestors (MRCAs) 250 million years ago? Modern plant genomes help to address such key questions in unveiling precise species genealogies. The field of paleogenomics is undergoing a paradigm shift for investigating species evolution from the study of ancestral genomes from extinct species to deciphering the evolutionary forces (in terms of duplication, fusion, fission, deletion, and translocation) that drove present‐day plant diversity (in terms of chromosome/gene number and genome size). In this review, inferred ancestral karyotype genomes are shown to be powerful tools to (1) unravel the past history of extant species by recovering the variations of ancestral genomic compartments and (2) accelerate translational research by facilitating the transfer of genomic information from model systems to species of agronomic interest.  相似文献   

18.

Background  

Spliceosomal introns are an ancient, widespread hallmark of eukaryotic genomes. Despite much research, many questions regarding the origin and evolution of spliceosomal introns remain unsolved, partly due to the difficulty of inferring ancestral gene structures. We circumvent this problem by using genes originated by endosymbiotic gene transfer, in which an intron-less structure at the time of the transfer can be assumed.  相似文献   

19.
Sequencing pathogen genomes is costly, demanding careful allocation of limited sequencing resources. We built a computational Sequencing Analysis Pipeline (SAP) to guide decisions regarding the amount of genomic sequencing necessary to develop high-quality diagnostic DNA and protein signatures. SAP uses simulations to estimate the number of target genomes and close phylogenetic relatives (near neighbors or NNs) to sequence. We use SAP to assess whether draft data are sufficient or finished sequencing is required using Marburg and variola virus sequences. Simulations indicate that intermediate to high-quality draft with error rates of 10−3–10−5 (~8× coverage) of target organisms is suitable for DNA signature prediction. Low-quality draft with error rates of ~1% (3× to 6× coverage) of target isolates is inadequate for DNA signature prediction, although low-quality draft of NNs is sufficient, as long as the target genomes are of high quality. For protein signature prediction, sequencing errors in target genomes substantially reduce the detection of amino acid sequence conservation, even if the draft is of high quality. In summary, high-quality draft of target and low-quality draft of NNs appears to be a cost-effective investment for DNA signature prediction, but may lead to underestimation of predicted protein signatures.  相似文献   

20.
The vital parameter data for 62 stocks, covering 38 species, collected from the literature, including parameters of age, growth, and reproduction, were log-transformed and analyzed using multivariate analyses. Three groups were identified and empirical equations were developed for each to describe the relationships between the predicted finite rates of population increase (λ’) and the vital parameters, maximum age (Tmax), age at maturity (Tm), annual fecundity (f/Rc)), size at birth (Lb), size at maturity (Lm), and asymptotic length (L). Group (1) included species with slow growth rates (0.034 yr-1 < k < 0.103 yr-1) and extended longevity (26 yr < Tmax < 81 yr), e.g., shortfin mako Isurus oxyrinchus, dusky shark Carcharhinus obscurus, etc.; Group (2) included species with fast growth rates (0.103 yr-1 < k < 0.358 yr-1) and short longevity (9 yr < Tmax < 26 yr), e.g., starspotted smoothhound Mustelus manazo, gray smoothhound M. californicus, etc.; Group (3) included late maturing species (Lm/L ≧ 0.75) with moderate longevity (Tmax < 29 yr), e.g., pelagic thresher Alopias pelagicus, sevengill shark Notorynchus cepedianus. The empirical equation for all data pooled was also developed. The λ’ values estimated by these empirical equations showed good agreement with those calculated using conventional demographic analysis. The predictability was further validated by an independent data set of three species. The empirical equations developed in this study not only reduce the uncertainties in estimation but also account for the difference in life history among groups. This method therefore provides an efficient and effective approach to the implementation of precautionary shark management measures.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号