首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
There is an increasing role of population genetics in human genetic research linking empirical observations with hypotheses about sequence variation due to historical and evolutionary causes. In addition, the data sets are increasing in size, with genome-wide data becoming a common place in many empirical studies. As far as more information is available, it becomes clear that simplest hypotheses are not consistent with data. Simulations will provide the key tool to contrast complex hypotheses on real data by generating simulated data under the hypothetical historical and evolutionary conditions that we want to contrast. Undoubtedly, developing tools for simulating large sequences that at the same time allow simulate natural selection, recombination and complex demography patterns will be of great interest in order to better understanding the trace left on the DNA by different interacting evolutionary forces. Simulation tools will be also essential to evaluate the sampling properties of any statistics used on genome-wide association studies and to compare performance of methods applied at genome-wide scales. Several recent simulation tools have been developed. Here, we review some of the currently existing simulators which allow for efficient simulation of large sequences on complex evolutionary scenarios. In addition, we will point out future directions in this field which are already a key part of the current research in evolutionary biology and it seems that it will be a primary tool in the future research of genome and post-genomic biology.  相似文献   

2.

Background

Coalescent simulation is pivotal for understanding population evolutionary models and demographic histories, as well as for developing novel analytical methods for genetic association studies for DNA sequence data. A plethora of coalescent simulators are developed, but selecting the most appropriate program remains challenging.

Results

We extensively compared performances of five widely used coalescent simulators – Hudson’s ms, msHOT, MaCS, Simcoal2, and fastsimcoal, to provide a practical guide considering three crucial factors, 1) speed, 2) scalability and 3) recombination hotspot position and intensity accuracy. Although ms represents a popular standard coalescent simulator, it lacks the ability to simulate sequences with recombination hotspots. An extended program msHOT has compensated for the deficiency of ms by incorporating recombination hotspots and gene conversion events at arbitrarily chosen locations and intensities, but remains limited in simulating long stretches of DNA sequences. Simcoal2, based on a discrete generation-by-generation approach, could simulate more complex demographic scenarios, but runs comparatively slow. MaCS and fastsimcoal, both built on fast, modified sequential Markov coalescent algorithms to approximate standard coalescent, are much more efficient whilst keeping salient features of msHOT and Simcoal2, respectively. Our simulations demonstrate that they are more advantageous over other programs for a spectrum of evolutionary models. To validate recombination hotspots, LDhat 2.2 rhomap package, sequenceLDhot and Haploview were compared for hotspot detection, and sequenceLDhot exhibited the best performance based on both real and simulated data.

Conclusions

While ms remains an excellent choice for general coalescent simulations of DNA sequences, MaCS and fastsimcoal are much more scalable and flexible in simulating a variety of demographic events under different recombination hotspot models. Furthermore, sequenceLDhot appears to give the most optimal performance in detecting and validating cross-over hotspots.  相似文献   

3.
We introduce a flexible and robust simulation-based framework to infer demographic parameters from the site frequency spectrum (SFS) computed on large genomic datasets. We show that our composite-likelihood approach allows one to study evolutionary models of arbitrary complexity, which cannot be tackled by other current likelihood-based methods. For simple scenarios, our approach compares favorably in terms of accuracy and speed with , the current reference in the field, while showing better convergence properties for complex models. We first apply our methodology to non-coding genomic SNP data from four human populations. To infer their demographic history, we compare neutral evolutionary models of increasing complexity, including unsampled populations. We further show the versatility of our framework by extending it to the inference of demographic parameters from SNP chips with known ascertainment, such as that recently released by Affymetrix to study human origins. Whereas previous ways of handling ascertained SNPs were either restricted to a single population or only allowed the inference of divergence time between a pair of populations, our framework can correctly infer parameters of more complex models including the divergence of several populations, bottlenecks and migration. We apply this approach to the reconstruction of African demography using two distinct ascertained human SNP panels studied under two evolutionary models. The two SNP panels lead to globally very similar estimates and confidence intervals, and suggest an ancient divergence (>110 Ky) between Yoruba and San populations. Our methodology appears well suited to the study of complex scenarios from large genomic data sets.  相似文献   

4.
Studies of bioinvasions have revealed various strategies of invasion, depending on the ecosystem invaded and the alien species concerned. Here, we consider how migration (as a demographic factor), as well as ecological and evolutionary changes, affect invasion success. We propose three main theoretical scenarios that depend on how these factors generate the match between an invader and its new environment. Our framework highlights the features that are common to, or differ among, observed invasion cases, and clarifies some general trends that have been previously highlighted in bioinvasions. We also suggest some new directions of research, such as the assessment of the time sequence of demographic, genetic and environmental changes, using detailed temporal surveys.  相似文献   

5.
The study of local adaptation is rendered difficult by many evolutionary confounding phenomena (for example, genetic drift and demographic history). When complex traits are involved in local adaptation, phenomena such as phenotypic plasticity further hamper evolutionary biologists to study the complex relationships between phenotype, genotype and environment. In this perspective paper, we suggest that the common garden experiment, specifically designed to deal with phenotypic plasticity, has a clear role to play in the study of local adaptation, even (if not specifically) in the genomic era. After a quick review of some high-throughput genotyping protocols relevant in the context of a common garden, we explore how to improve common garden analyses with dense marker panel data and recent statistical methods. We then show how combining approaches from population genomics and genome-wide association studies with the settings of a common garden can yield to a very efficient, thorough and integrative study of local adaptation. Especially, evidence from genomic (for example, genome scan) and phenotypic origins constitute independent insights into the possibility of local adaptation scenarios, and genome-wide association studies in the context of a common garden experiment allow to decipher the genetic bases of adaptive traits.  相似文献   

6.
7.
Girod C  Vitalis R  Leblois R  Fréville H 《Genetics》2011,188(1):165-179
Reconstructing the demographic history of populations is a central issue in evolutionary biology. Using likelihood-based methods coupled with Monte Carlo simulations, it is now possible to reconstruct past changes in population size from genetic data. Using simulated data sets under various demographic scenarios, we evaluate the statistical performance of Msvar, a full-likelihood Bayesian method that infers past demographic change from microsatellite data. Our simulation tests show that Msvar is very efficient at detecting population declines and expansions, provided the event is neither too weak nor too recent. We further show that Msvar outperforms two moment-based methods (the M-ratio test and Bottleneck) for detecting population size changes, whatever the time and the severity of the event. The same trend emerges from a compilation of empirical studies. The latest version of Msvar provides estimates of the current and the ancestral population size and the time since the population started changing in size. We show that, in the absence of prior knowledge, Msvar provides little information on the mutation rate, which results in biased estimates and/or wide credibility intervals for each of the demographic parameters. However, scaling the population size parameters with the mutation rate and scaling the time with current population size, as coalescent theory requires, significantly improves the quality of the estimates for contraction but not for expansion scenarios. Finally, our results suggest that Msvar is robust to moderate departures from a strict stepwise mutation model.  相似文献   

8.
MOTIVATION: Genetic studies focus on increasingly larger genomic regions of both extant and ancient DNA, and there is a need for simulation software to match these technological advances. We present here a new coalescent-based simulation program fastsimcoal, which is able to quickly simulate a variety of genetic markers scattered over very long genomic regions with arbitrary recombination patterns under complex evolutionary scenarios. Availability and Implementation: fastsimcoal is a C++ program compiled for Windows, MacOsX and Linux platforms. It is freely available at cmpg.unibe.ch/software/fastsimcoal/, together with its detailed user manual and example input files.  相似文献   

9.
We propose a simple model of evolution at a pair of SNP loci, under mutation, genetic drift and recombination. The developed model allows to consider evolution of SNPs under different demographic scenarios. We applied it to SNP data containing polymorphisms spanning 19 gene regions. We initially matched the linkage disequilibrium (LD) data only, and then we reconciled both LD and heterozygosity data. The imbalance between LD and heterozygosity data, observed for some of the analyzed genomic regions, may be a signature of selection acting in these regions. However, assuming neutrality, we obtain estimates of the age of population expansion of modern humans, which are consistent with the consensus estimates. In addition, we are able to estimate the ages of the polymorphisms observed in different genomic regions and we find that they vary widely with respect to their age. Polymorphisms at loci implicated in human disease, seem to be younger than average. Our results supplement the conclusions originally obtained by Reich and co-workers for the same set of data.  相似文献   

10.
Recombination can negatively impact methods designed to detect divergent gene function that rely on explicit knowledge of a gene tree. However, we know little about how recombination detection methods perform under evolutionary scenarios encountered in studies of functional molecular divergence. We use simulation to evaluate false positive rates for six recombination detection methods (GENECONV, MaxChi, Chimera, RDP, GARD-SBP, GARD-MBP) under evolutionary scenarios that might increase false positives. Broadly, these scenarios address: (i) asymmetric tree topology and sequence divergence, (ii) non-stationary codon bias and selection pressure, and (iii) positive selection. We also evaluate power to detect recombination under truly recombinant history. As with previous studies, we find that power increases with sequence divergence. However, we also find that accuracy to correctly infer the number of breakpoints is extremely low. When recombination is absent, increased sequence divergence leads to increased false positives. Furthermore, one method (GARD-SBP) is sensitive to tree shape, with higher false positive rates under an asymmetric tree topology. Somewhat surprisingly, all methods are robust to the simulated heterogeneity in codon bias, shifts in selection pressure and presence of positive selection. Based on these findings, we recommend that studies of functional divergence in systems where recombination is plausible can, and should, include a pre-test for recombination. Application of all methods to the core genome of Prochlorococcus reveals a substantial lack of concordance among results. Based on analysis of both real and simulated datasets we present some guidelines for the investigation of recombination in genes that may have experienced functional divergence.  相似文献   

11.
The patterns of genetic variation within and among individuals and populations can be used to make inferences about the evolutionary forces that generated those patterns. Numerous population genetic approaches have been developed in order to infer evolutionary history. Here, we present the “Two-Two (TT)” and the “Two-Two-outgroup (TTo)” methods; two closely related approaches for estimating divergence time based in coalescent theory. They rely on sequence data from two haploid genomes (or a single diploid individual) from each of two populations. Under a simple population-divergence model, we derive the probabilities of the possible sample configurations. These probabilities form a set of equations that can be solved to obtain estimates of the model parameters, including population split times, directly from the sequence data. This transparent and computationally efficient approach to infer population divergence time makes it possible to estimate time scaled in generations (assuming a mutation rate), and not as a compound parameter of genetic drift. Using simulations under a range of demographic scenarios, we show that the method is relatively robust to migration and that the TTo method can alleviate biases that can appear from drastic ancestral population size changes. We illustrate the utility of the approaches with some examples, including estimating split times for pairs of human populations as well as providing further evidence for the complex relationship among Neandertals and Denisovans and their ancestors.  相似文献   

12.
Next-Generation Sequencing (NGS) technologies have dramatically revolutionised research in many fields of genetics. The ability to sequence many individuals from one or multiple populations at a genomic scale has greatly enhanced population genetics studies and made it a data-driven discipline. Recently, researchers have proposed statistical modelling to address genotyping uncertainty associated with NGS data. However, an ongoing debate is whether it is more beneficial to increase the number of sequenced individuals or the per-sample sequencing depth for estimating genetic variation. Through extensive simulations, I assessed the accuracy of estimating nucleotide diversity, detecting polymorphic sites, and predicting population structure under different experimental scenarios. Results show that the greatest accuracy for estimating population genetics parameters is achieved by employing a large sample size, despite single individuals being sequenced at low depth. Under some circumstances, the minimum sequencing depth for obtaining accurate estimates of allele frequencies and to identify polymorphic sites is , where both alleles are more likely to have been sequenced. On the other hand, inferences of population structure are more accurate at very large sample sizes, even with extremely low sequencing depth. This all points to the conclusion that under various experimental scenarios, in cost-limited population genetics studies, large sample sizes at low sequencing depth are desirable to achieve high accuracy. These findings will help researchers design their experimental set-ups and guide further investigation on the effect of protocol design for genetic research.  相似文献   

13.
Cultural practices can deeply influence genetic diversity patterns. The Neolithic transitions that took place at different times and locations around the world led to major cultural and demographic changes that influenced and therefore left their marks on human genetic diversity patterns. Several studies on the European Neolithic transition suggest that mitochondrial DNA (mtDNA) and Y-chromosome data can exhibit different patterns, which could be owing to different demographic histories for females and males. Archaeological and anthropological data suggest that the transition from hunter-gatherers (HGs) to farmers' societies is probably associated with changes in social organization, particularly in post-marital residence (PMR) rules (i.e. patrilocality, matrilocality or bilocality). The movements of humans and genes associated with these rules can be seen as sex-biased short-range migrations. We developed a new individual-based simulation approach to explore the genetic consequences of 45 different scenarios, where we varied the patterns of PMR and admixture between HGs and farmers. We recorded mtDNA and Y-chromosome data and analysed their diversity patterns within and between populations, through time and space. We also collected published mtDNA and Y-chromosome data from European and Near-Eastern populations in order to identify the scenarios that would best explain them. We show that: (i) different PMR systems can lead to different patterns of genetic diversity and differentiation, (ii) asymmetries between mtDNA and Y-chromosome can be owing to different behaviours between males and females, but also to different mutations rates, and (iii) patrilocality in farmers explains the present patterns of genetic diversity better than matrilocality or bilocality. Moreover, we found that (iv) the genetic diversity of farmers change depending on the HGs PMR rules even though they are assumed to disappear more than 5000 years ago in our simulations.  相似文献   

14.
Species identification is one of the most important issues in biological studies. Due to recent increases in the amount of genomic information available and the development of DNA sequencing technologies, the applicability of using DNA sequences to identify species (commonly referred to as “DNA barcoding”) is being tested in many areas. Several methods have been suggested to identify species using DNA sequences, including similarity scores, analysis of phylogenetic and population genetic information, and detection of species-specific sequence patterns. Although these methods have demonstrated good performance under a range of circumstances, they also have limitations, as they are subject to loss of information, require intensive computation and are sensitive to model mis-specification, and can be difficult to evaluate in terms of the significance of identification. Here, we suggest a new DNA barcoding method in which support vector machine (SVM) procedures are adopted. Our new method is nonparametric and thus is expected to be robust for a wide range of evolutionary scenarios as well as multilocus analyses. Furthermore, we describe bootstrap procedures that can be used to test the significances of species identifications. We implemented a novel conversion technique for transforming sequence data to real-valued vectors, and therefore, bootstrap procedures can be easily combined with our SVM approach. In this study, we present the results of simulation studies and empirical data analyses to demonstrate the performance of our method and discuss its properties.  相似文献   

15.
High‐throughput sequencing (HTS) is central to the study of population genomics and has an increasingly important role in constructing phylogenies. Choices in research design for sequencing projects can include a wide range of factors, such as sequencing platform, depth of coverage and bioinformatic tools. Simulating HTS data better informs these decisions, as users can validate software by comparing output to the known simulation parameters. However, current standalone HTS simulators cannot generate variant haplotypes under even somewhat complex evolutionary scenarios, such as recombination or demographic change. This greatly reduces their usefulness for fields such as population genomics and phylogenomics. Here I present the R package jackalope that simply and efficiently simulates (i) sets of variant haplotypes from a reference genome and (ii) reads from both Illumina and Pacific Biosciences platforms. Haplotypes can be simulated using phylogenies, gene trees, coalescent‐simulation output, population‐genomic summary statistics, and Variant Call Format (VCF) files. jackalope can simulate single, paired‐end or mate‐pair Illumina reads, as well as reads from Pacific Biosciences. These simulations include sequencing errors, mapping qualities, multiplexing and optical/PCR duplicates. It can read reference genomes from fasta files and can simulate new ones, and all outputs can be written to standard file formats. jackalope is available for Mac, Windows and Linux systems.  相似文献   

16.
Recent investigations such as a more powerful quasi-likelihoods score test (MQLS) statistic have enabled the efficient association analysis with related samples. Although those approaches are robust against the mis-specified phenotypic distribution and covariance structure, it has been shown that MQLS statistic becomes violated under the presence of the population substructure if the level of population substructure depends on the genomic location. In this report, we propose a new statistical method which combines EIGENSTRAT approach and MQLS-statistic. The proposed method was evaluated with simulation data under various scenarios and we found that proposed method performs better than the traditional methods such as transmission disequilibrium test. The proposed method was applied to genetic association analysis for body mass index with Framingham heart study, and we found that rs1121980 and rs9940128 in the linkage block in FTO gene are associated with the body mass index.  相似文献   

17.
Sequence-level population simulations over large genomic regions   总被引:4,自引:1,他引:3       下载免费PDF全文
Simulation is an invaluable tool for investigating the effects of various population genetics modeling assumptions on resulting patterns of genetic diversity, and for assessing the performance of statistical techniques, for example those designed to detect and measure the genomic effects of selection. It is also used to investigate the effectiveness of various design options for genetic association studies. Backward-in-time simulation methods are computationally efficient and have become widely used since their introduction in the 1980s. The forward-in-time approach has substantial advantages in terms of accuracy and modeling flexibility, but at greater computational cost. We have developed flexible and efficient simulation software and a rescaling technique to aid computational efficiency that together allow the simulation of sequence-level data over large genomic regions in entire diploid populations under various scenarios for demography, mutation, selection, and recombination, the latter including hotspots and gene conversion. Our forward evolution of genomic regions (FREGENE) software is freely available from www.ebi.ac.uk/projects/BARGEN together with an ancillary program to generate phenotype labels, either binary or quantitative. In this article we discuss limitations of coalescent-based simulation, introduce the rescaling technique that makes large-scale forward-in-time simulation feasible, and demonstrate the utility of various features of FREGENE, many not previously available.  相似文献   

18.
Hager R  Cheverud JM  Wolf JB 《Genetics》2008,178(3):1755-1762
Epigenetic effects are increasingly recognized as an important source of variation in complex traits and have emerged as the focus of a rapidly expanding area of research. Principle among these effects is genomic imprinting, which has generally been examined in analyses of complex traits by testing for parent-of-origin-dependent effects of alleles. However, in most of these analyses maternal effects are confounded with genomic imprinting because they can produce the same patterns of phenotypic variation expected for various forms of imprinting. Distinguishing between the two is critical for genetic and evolutionary studies because they have entirely different patterns of gene expression and evolutionary dynamics. Using a simple single-locus model, we show that maternal genetic effects can result in patterns that mimic those expected under genomic imprinting. We further demonstrate how maternal effects and imprinting effects can be distinguished using genomic data from parents and offspring. The model results are applied to a genome scan for quantitative trait loci (QTL) affecting growth- and weight-related traits in mice to illustrate how maternal effects can mimic imprinting. This genome scan revealed five separate maternal-effect loci that caused a diversity of patterns mimicking those expected under various modes of genomic imprinting. These results demonstrate that the appearance of parent-of-origin-dependent effects (POEs) of alleles at a locus cannot be taken as direct evidence that the locus is imprinted. Moreover, they show that, in gene mapping studies, genetic data from both parents and offspring are required to successfully differentiate between imprinting and maternal effects as the cause of apparent parent-of-origin effects of alleles.  相似文献   

19.
Abstract. Population history and current demographic and ecological factors determine the amount of genetic variation within and the degree of differentiation among populations. Differences in the life history and ecology of codistributed species may lead to differences in hierarchical population genetic structure. Here, we compare patterns of genetic diversity and structure of two species of spiny rats in the genus Proechimys from the Rio Jurua of western Amazonian Brazil. Based on the ecological and life-history differences between the two species, we make predictions as to how they might differ in patterns of genetic diversity and structure. We use mitochondrial sequence data from the cytochrome b gene to test these predictions. Although both species maintain nearly the same number of mitochondrial haplotypes across the sampled range, they differ in levels of genetic diversity and geographic structure. Patterns of gene flow are also different between the two species with average M-values of nearly three in P. steerei and less than one in P. simonsi . Our initial predictions are largely upheld by the genetic data and where conflicting hypotheses arise, we suggest further studies that may allow us to distinguish among evolutionary scenarios. Separating the effects of history and ongoing demography on patterns of genetic diversity is challenging. Combining genetic analyses with field studies remains essential to disentangling these complex processes.  相似文献   

20.
The ability to infer relationships between groups of sequences, either by searching for their evolutionary history or by comparing their sequence similarity, can be a crucial step in hypothesis testing. Interpreting relationships of human immunodeficiency virus type 1 (HIV-1) sequences can be challenging because of their rapidly evolving genomes, but it may also lead to a better understanding of the underlying biology. Several studies have focused on the evolution of HIV-1, but there is little information to link sequence similarities and evolutionary histories of HIV-1 to the epidemiological information of the infected individual. Our goal was to correlate patterns of HIV-1 genetic diversity with epidemiological information, including risk and demographic factors. These correlations were then used to predict epidemiological information through analyzing short stretches of HIV-1 sequence. Using standard phylogenetic and phenetic techniques on 100 HIV-1 subtype B sequences, we were able to show some correlation between the viral sequences and the geographic area of infection and the risk of men who engage in sex with men. To help identify more subtle relationships between the viral sequences, the method of multidimensional scaling (MDS) was performed. That method identified statistically significant correlations between the viral sequences and the risk factors of men who engage in sex with men and individuals who engage in sex with injection drug users or use injection drugs themselves. Using tree construction, MDS, and newly developed likelihood assignment methods on the original 100 samples we sequenced, and also on a set of blinded samples, we were able to predict demographic/risk group membership at a rate statistically better than by chance alone. Such methods may make it possible to identify viral variants belonging to specific demographic groups by examining only a small portion of the HIV-1 genome. Such predictions of demographic epidemiology based on sequence information may become valuable in assigning different treatment regimens to infected individuals.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号