首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 309 毫秒
1.
Achaz G 《Genetics》2008,179(3):1409-1424
Many data sets one could use for population genetics contain artifactual sites, i.e., sequencing errors. Here, we first explore the impact of such errors on several common summary statistics, assuming that sequencing errors are mostly singletons. We thus show that in the presence of those errors, estimators of can be strongly biased. We further show that even with a moderate number of sequencing errors, neutrality tests based on the frequency spectrum reject neutrality. This implies that analyses of data sets with such errors will systematically lead to wrong inferences of evolutionary scenarios. To avoid to these errors, we propose two new estimators of theta that ignore singletons as well as two new tests Y and Y* that can be used to test neutrality despite sequencing errors. All in all, we show that even though singletons are ignored, these new tests show some power to detect deviations from a standard neutral model. We therefore advise the use of these new tests to strengthen conclusions in suspicious data sets.  相似文献   

2.
Evaluating the likelihood function of parameters in highly-structured population genetic models from extant deoxyribonucleic acid (DNA) sequences is computationally prohibitive. In such cases, one may approximately infer the parameters from summary statistics of the data such as the site-frequency-spectrum (SFS) or its linear combinations. Such methods are known as approximate likelihood or Bayesian computations. Using a controlled lumped Markov chain and computational commutative algebraic methods, we compute the exact likelihood of the SFS and many classical linear combinations of it at a non-recombining locus that is neutrally evolving under the infinitely-many-sites mutation model. Using a partially ordered graph of coalescent experiments around the SFS, we provide a decision-theoretic framework for approximate sufficiency. We also extend a family of classical hypothesis tests of standard neutrality at a non-recombining locus based on the SFS to a more powerful version that conditions on the topological information provided by the SFS.  相似文献   

3.
Contemporary DNA sequences can provide information about the historical demography of a species. However, different molecular markers are informative under different circumstances. In particular, mitochondrial (mt)DNA is uniparentally inherited and haploid in most vertebrates and thus has a smaller effective population size than diploid, biparentally inherited nuclear (n)DNA. Here, we review the characteristics of mtDNA and nDNA in the context of historical demography. In particular, we address how their contrasting rates of evolution and sex‐biased dispersal can lead to different demographic inferences. We do so in the context of an extensive review of the vertebrate literature that describes the use of mtDNA and nDNA sequence data in demographic reconstruction. We discuss the effects of coalescence, effective population size, substitution rates, and sex‐biased dispersal on informative timeframes and expected patterns of genetic differentiation. We argue that mtDNA variationin species with male‐biased dispersal can imply deviations from neutrality that do not reflect actual population expansion or selection. By contrast, mtDNA can be more informative when coalescence has occurred within the recent past, which appears to be the case with many vertebrates. We also compare the application and interpretation of demographic and neutrality test statistics in historical demography studies. © 2014 The Linnean Society of London, Biological Journal of the Linnean Society, 2014, 112 , 367–386.  相似文献   

4.
5.
We present here the first comparative analysis at the population level between Restriction Fragment Length Polymorphism (RFLP) and control region sequence polymorphism in a large and homogeneous Senegalese Mandenka sample. Eleven RFLP haplotypes and 60 different sequences are found in 119 individuals, revealing that a very high level of mtDNA diversity can be maintained in a small population. A sequence neighbor- joining tree and an analysis of molecular variance show that sequences associated with a given restriction haplotype are evolutionarily highly correlated: sequencing generally leads to the subtyping of RFLP haplotypes. Evolutionary relationships among RFLP haplotypes inferred from restriction site differences are in good agreement with those inferred from sequence data. A single difference is observed and is likely due to a single restriction homoplasy having occurred in the control region. Selective neutrality tests on both RFLP and sequence data accept the hypotheses of mtDNA neutrality and population equilibrium. The deep coalescence times (exceeding 50,000 yr) of sequences associated with the two most frequent restriction haplotypes confirm that the Niokolo Mandenka population has not passed through a recent bottleneck and that gene flow is maintained among West African populations despite ethnic differences.   相似文献   

6.
Neutrality tests using DNA polymorphism from multiple samples   总被引:5,自引:0,他引:5  
Li H  Zhang Y  Zhang YP  Fu YX 《Genetics》2003,163(3):1147-1151
The polymorphism of a gene or a locus is studied with increasing frequency by multiple laboratories or the same group at different times. Such practice results in polymorphism being revealed by different samples at different regions of the locus. Tests of neutrality have been widely conducted for polymorphism data but commonly used statistical tests cannot be applied directly to such data. This article provides a procedure to conduct a neutrality test and details are given for two commonly used tests. Applying the two new tests to the chemokine-receptor gene (CCR5) in humans, we found that the hypothesis that all mutations are selectively neutral cannot explain the observed pattern of DNA polymorphism.  相似文献   

7.
8.
A data set consisting of DNA sequences from a large-scale shotgun DNA cloning and sequencing project has been collected and posted for public release. The purpose is to propose a standard genomic DNA sequencing data set by which various algorithms and implementations can be tested. This set of data is divided into two subsets, one containing raw DNA sequence data (1023 clones) and the other consisting of the corresponding partially refined or edited DNA sequence data (820 clones). Suggested criteria or guidelines for this data refinement are presented so that algorithms for preprocessing and screening raw sequences may be developed. Development of such preprocessing, screening, aligning, and assembling algorithms will expedite large-scale DNA sequencing projects so that the complete unambiguous consensus DNA sequences will be made available to the general research community in a quicker manner. Smaller scale routine DNA sequencing projects will also be greatly aided by such computational efforts.  相似文献   

9.
Patterns of DNA sequence polymorphisms can be used to understand the processes of demography and adaptation within natural populations. High-throughput generation of DNA sequence data has historically been the bottleneck with respect to data processing and experimental inference. Advances in marker technologies have largely solved this problem. Currently, the limiting step is computational, with most molecular population genetic software allowing a gene-by-gene analysis through a graphical user interface. An easy-to-use analysis program that allows both high-throughput processing of multiple sequence alignments along with the flexibility to simulate data under complex demographic scenarios is currently lacking. We introduce a new program, named DnaSAM, which allows high-throughput estimation of DNA sequence diversity and neutrality statistics from experimental data along with the ability to test those statistics via Monte Carlo coalescent simulations. These simulations are conducted using the ms program, which is able to incorporate several genetic parameters (e.g. recombination) and demographic scenarios (e.g. population bottlenecks). The output is a set of diversity and neutrality statistics with associated probability values under a user-specified null model that are stored in easy to manipulate text file.  相似文献   

10.
Among the molecular markers commonly used for mosquito taxonomy, the internal transcribed spacer 2 (ITS2) of the ribosomal DNA is useful for distinguishing among closely-related species. Here we review 178 GenBank accession numbers matching ITS2 sequences of Latin American anophelines. Among those, we found 105 unique sequences corresponding to 35 species. Overall the ITS2 sequences distinguish anopheline species, however, information on intraspecific and geographic variations is scarce. Intraspecific variations ranged from 0.2% to 19% and our analysis indicates that misidentification and/or sequencing errors could be responsible for some of the high values of divergence. Research in Latin American malaria vector taxonomy profited from molecular data provided by single or few field capture mosquitoes. However we propose that caution should be taken and minimum requirements considered in the design of additional studies. Future studies in this field should consider that: (1) voucher specimens, assigned to the DNA sequences, need to be deposited in collections, (2) intraspecific variations should be thoroughly evaluated, (3) ITS2 and other molecular markers, considered as a group, will provide more reliable information, (4) biological data about vector populations are missing and should be prioritized, (5) the molecular markers are most powerful when coupled with traditional taxonomic tools.  相似文献   

11.
SUMMARY: Manual processing of DNA methylation data from bisulfite sequencing is a tedious and error-prone task. Here we present an interactive software tool that provides start-to-end support for this process. In an easy-to-use manner, the tool helps the user to import the sequence files from the sequencer, to align them, to exclude or correct critical sequences, to document the experiment, to perform basic statistics and to produce publication-quality diagrams.Emphasis is put on quality control: The program automatically assesses data quality and provides warnings and suggestions for dealing with critical sequences. The BiQ Analyzer program is implemented in the Java programming language and runs on any platform for which a recent Java virtual machine is available. AVAILABILITY: The program is available without charge for non-commercial users and can be downloaded from http://biq-analyzer.bioinf.mpi-inf.mpg.de/  相似文献   

12.
MOTIVATION: During the process of high-throughput genome sequencing there are opportunities for mixups of reagents and data associated with particular projects. The sequencing templates or sequence data generated for an assembly may become contaminated with reagents or sequences from another project, resulting in poorer quality and inaccurate assemblies. RESULTS: We have developed a system to assess sequence assemblies and monitor for laboratory mixups. We describe several methods for testing the consistency of assemblies and resolving mixed ones. We use statistical tests to evaluate the distribution of sequencing reads from different plates into contigs, and a graph-based approach to resolve situations where data has been inappropriately combined. While these methods have been designed for use in a high-throughput DNA sequencing environment processing thousands of clones, they can be applied in any situation where distinct sequencing projects are performed at redundant coverage.  相似文献   

13.
DNA水平自然选择作用的检测   总被引:16,自引:1,他引:15  
周琦  王文 《动物学研究》2004,25(1):73-80
上个世纪60年代,Kimura提出的“中性进化”假说使经典的达尔文自然选择学说遭遇了前所未有的挑战。但新近的研究表明:在DNA水平,越来越多的证据支持“自然选择”的进化理论。这些研究成果得益于近年来大量群体和基因组DNA数据的积累,以及理论群体遗传学的发展。在DNA水平检测选择作用是否存在的方法包括两大类:种内多态性检验和种间差异度检验。前者以Tajima(1989)提出的D检验为代表,后者大都基于“中性条件下,种内与种间进化速率一致”的原理。这些方法以中性假说作为零假设,结合统计检验方法分析DNA数据,被称为“中性检验”。这些方法对于解决一些有关进化的基础理论问题和人类遗传学及生物信息学的深入研究都具有重要意义。本文介绍几个应用广泛的检测方法,以使国内的读者了解它们的基本思路和操作方法。  相似文献   

14.
15.
In this work we examined the phylogeography of the South American subterranean herbivorous rodent Ctenomys talarum (Talas tuco-tuco) using mitochondrial DNA (mtDNA) control region (D-loop) sequences, and we assessed the geographical genetic structure of this species in comparison with that of subterranean Ctenomys australis, which we have shown previously to be parapatric to C. talarum and to also live in a coastal sand dune habitat. A significant apportionment of the genetic variance among regional groups indicated that putative geographical barriers, such as rivers, substantially affected the pattern of genetic structure in C. talarum. Furthermore, genetic differentiation is consistent with a simple model of isolation by distance, possibly evidencing equilibrium between gene flow and local genetic drift. In contrast, C. australis showed limited hierarchical partitioning of genetic variation and departed from an isolation-by-distance pattern. Mismatch distributions and tests of neutrality suggest contrasting histories of these two species: C. talarum appears to be characterized by demographic stability and no significant departures from neutrality, whereas C. australis has undergone a recent demographic expansion and/or departures from strict neutrality in its mtDNA.  相似文献   

16.
Wild specimens are often collected in challenging field conditions, where samples may be contaminated with the DNA of conspecific individuals. This contamination can result in false genotype calls, which are difficult to detect, but may also cause inaccurate estimates of heterozygosity, allele frequencies and genetic differentiation. Marine broadcast spawners are especially problematic, because population genetic differentiation is low and samples are often collected in bulk and sometimes from active spawning aggregations. Here, we used contaminated and clean Pacific herring (Clupea pallasi) samples to test (a) the efficacy of bleach decontamination, (b) the effect of decontamination on RAD genotypes and (c) the consequences of contaminated samples on population genetic analyses. We collected fin tissue samples from actively spawning (and thus contaminated) wild herring and nonspawning (uncontaminated) herring. Samples were soaked for 10 min in bleach or left untreated, and extracted DNA was used to prepare DNA libraries using a restriction site‐associated DNA (RAD) approach. Our results demonstrate that intraspecific DNA contamination affects patterns of individual and population variability, causes an excess of heterozygotes and biases estimates of population structure. Bleach decontamination was effective at removing intraspecific DNA contamination and compatible with RAD sequencing, producing high‐quality sequences, reproducible genotypes and low levels of missing data. Although sperm contamination may be specific to broadcast spawners, intraspecific contamination of samples may be common and difficult to detect from high‐throughput sequencing data and can impact downstream analyses.  相似文献   

17.
spads 1.0 (for ‘Spatial and Population Analysis of DNA Sequences’) is a population genetic toolbox for characterizing genetic variability within and among populations from DNA sequences. In view of the drastic increase in genetic information available through sequencing methods, spads was specifically designed to deal with multilocus data sets of DNA sequences. It computes several summary statistics from populations or groups of populations, performs input file conversions for other population genetic programs and implements locus‐by‐locus and multilocus versions of two clustering algorithms to study the genetic structure of populations. The toolbox also includes two Matlab and r functions, Gdispal and Gdivpal , to display differentiation and diversity patterns across landscapes. These functions aim to generate interpolating surfaces based on multilocus distance and diversity indices. In the case of multiple loci, such surfaces can represent a useful alternative to multiple pie charts maps traditionally used in phylogeography to represent the spatial distribution of genetic diversity. These coloured surfaces can also be used to compare different data sets or different diversity and/or distance measures estimated on the same data set.  相似文献   

18.
Perspective: detecting adaptive molecular polymorphism: lessons from the MHC   总被引:13,自引:0,他引:13  
Abstract. In the 1960s, when population geneticists first began to collect data on the amount of genetic variation in natural populations, balancing selection was invoked as a possible explanation for how such high levels of molecular variation are maintained. However, the predictions of the neutral theory of molecular evolution have since become the standard by which cases of balancing selection may be inferred. Here we review the evidence for balancing selection acting on the major histocompatibility complex (MHC) of vertebrates, a genetic system that defies many of the predictions of neutrality. We apply many widely used tests of neutrality to MHC data as a benchmark for assessing the power of these tests. These tests can be categorized as detecting selection in the current generation, over the history of populations, or over the histories of species. We find that selection is not detectable in MHC datasets in every generation, population, or every evolutionary lineage. This suggests either that selection on the MHC is heterogeneous or that many of the current neutrality tests lack sufficient power to detect the selection consistently. Additionally, we identify a potential inference problem associated with several tests of neutrality. We demonstrate that the signals of selection may be generated in a relatively short period of microevolutionary time, yet these signals may take exceptionally long periods of time to be erased in the absence of selection. This is especially true for the neutrality test based on the ratio of nonsynonymous to synonymous substitutions. Inference of the nature of the selection events that create such signals should be approached with caution. However, a combination of tests on different time scales may overcome such problems.  相似文献   

19.
20.
It is fundamentally important to assess the fit of data to model in phylogenetic and evolutionary studies. Phylogenetic methods using molecular sequences typically start with a multiple alignment. It is possible to measure the fit of data to model expectations of data, for example, via the likelihood-ratio (G) test or the X(2) test, if all sites in all sequences have an unambiguous residue. However, nearly all alignments of interest contain sites (columns of the alignment) with missing data, that is, ambiguous nucleotides, gaps, or unsequenced regions, which must presently be removed before using the above tests. Unfortunately, this is often either undesirable or impractical, as it will discard much of the data. Here, we show how iterative ML estimators may directly estimate the site-pattern probabilities for columns with missing data, given only standard i.i.d. assumptions. The optimization may use an EM or Newton algorithm, or any other hill-climbing approach. The resulting optimal likelihood under the unconstrained or multinomial model may be compared directly with the likelihood of the data coming from the model (a G statistic). Alternatively the modified observed and the expected frequencies of site patterns may be compared using a X(2) test. The distribution of such statistics is best assessed using appropriate simulations. The new method is applicable to models using codons or paired sites. The methods are also useful with Hadamard conjugations (spectral analysis) and are illustrated with these and with ML evolutionary models that allow site-rate variability.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号