首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Much effort has been made to search for signatures of past natural selection in DNA sequences. However, currently acting selection is rarely detected in natural populations because of its rarity, low detection power of available methods, or both. Here, we develop a new test to detect viability selection over a single generation. In this test, one specific type of chromosomes is chosen as a reference, while all other chromosomes are designated as "focal". The test compares measures of variation between two groups of "focal" chromosomes: those found in reference/focal heterozygous individuals and those found in focal/focal homozygous individuals. In the absence of selection, we do not expect differences between these two groups as long as mating is random. On the other hand, currently acting selection can cause differences in some measures of variation. We applied this test to typing data for In(2L)t inversion polymorphism in a Drosophila melanogaster population, using "standard" (non-inverted) chromosomes as the focal class. Although the frequencies of In(2L)t and standard chromosomes did not deviate from the Hardy-Weinberg equilibrium, we found differences in allele frequency and the number of haplotypes between the two groups of standard chromosomes. This new test, in conjunction with the Hardy-Weinberg test, may shed light on how often strong selection is operating in extant populations.  相似文献   

2.
Interleukin-13 (IL13) is believed to play an important role in the pathogenesis of atopy and allergic asthma. To better understand genetic variation at the IL13 locus, we resequenced a 5.1-kb genomic region spanning the entire locus and identified 26 single-nucleotide polymorphisms (SNPs) in 74 individuals from three major populations-Chinese, Caucasian, and African. Our survey suggests exceptionally high and significant geographic structure at the IL13 locus between African and outside Africa populations. This unusual pattern suggests that positive selection that acts in some local populations may have played a role on the IL13 locus. In support of this suggestion, we found a significant excess of high frequency-derived SNPs in the Chinese population and Caucasian population, respectively, as expected after a recent episode of positive selection. Further, the unusual haplotype structure indicates that different scenarios of the action of positive selection on the IL13 locus in different populations may exist. In the Caucasian population, the skewed haplotype distribution dominated by one common haplotype supports the hypothesis of simple directional selection. Whereas, in the Chinese population, the two-round hitchhiking hypothesis may explain the skewed haplotype structure with three dominant ones. These findings may provide insight into the likely relative roles of selection and population history in establishing present-day variation at the IL13 locus, and, motivate further studies of this locus as an important candidate in common diseases association studies.  相似文献   

3.
In this report, we compare the differences between various site- and haplotype-frequency tests in their power to detect positive selection by doing computer simulations. Our results are the following. 1) Although haplotype-frequency tests that are conditional on the number of haplotypes (K) were developed for nonrecombining haplotypes, these tests are insensitive to recombination. Such tests, including the Ewens-Watterson (EW) test, can therefore be applied to recombining haplotypes. 2) Tests conditional on the number of segregating sites (S) become overly conservative in the presence of recombination. 3) The EW test is usually the most powerful test during the sweep phase, especially when the local recombination rate is high. 4) The "extended haplotype homozygosity" test relies heavily on the prior knowledge of the target of selection. With that knowledge, it is the most powerful test, whereas in the absence of this prior information, the test has little power. We also study the sensitivities of the haplotype-frequency tests to background selection and various demographic forces. We find that these tests are sensitive to some forces other than positive selection. To alleviate the problem of low specificity, compound tests, such as the DH test (Zeng et al. 2006), may be a solution. In the companion paper (Zeng K, Shi S, Wu C-I, in preparation), we use the EW test to devise 2 compound tests, which are more powerful in detecting positive selection than DH, but are also relatively insensitive to demography.  相似文献   

4.
Zeng K  Fu YX  Shi S  Wu CI 《Genetics》2006,174(3):1431-1439
By comparing the low-, intermediate-, and high-frequency parts of the frequency spectrum, we gain information on the evolutionary forces that influence the pattern of polymorphism in population samples. We emphasize the high-frequency variants on which positive selection and negative (background) selection exhibit different effects. We propose a new estimator of θ (the product of effective population size and neutral mutation rate), θL, which is sensitive to the changes in high-frequency variants. The new θL allows us to revise Fay and Wu's H-test by normalization. To complement the existing statistics (the H-test and Tajima's D-test), we propose a new test, E, which relies on the difference between θL and Watterson's θW. We show that this test is most powerful in detecting the recovery phase after the loss of genetic diversity, which includes the postselective sweep phase. The sensitivities of these tests to (or robustness against) background selection and demographic changes are also considered. Overall, D and H in combination can be most effective in detecting positive selection while being insensitive to other perturbations. We thus propose a joint test, referred to as the DH test. Simulations indicate that DH is indeed sensitive primarily to directional selection and no other driving forces.  相似文献   

5.
A method for detecting positive selection at single amino acid sites   总被引:23,自引:0,他引:23  
A method was developed for detecting the selective force at single amino acid sites given a multiple alignment of protein-coding sequences. The phylogenetic tree was reconstructed using the number of synonymous substitutions. Then, the neutrality was tested for each codon site using the numbers of synonymous and nonsynonymous changes throughout the phylogenetic tree. Computer simulation showed that this method accurately estimated the numbers of synonymous and nonsynonymous substitutions per site, as long as the substitution number on each branch was relatively small. The false-positive rate for detecting the selective force was generally low. On the other hand, the true-positive rate for detecting the selective force depended on the parameter values. Within the range of parameter values used in the simulation, the true-positive rate increased as the strength of the selective force and the total branch length (namely the total number of synonymous substitutions per site) in the phylogenetic tree increased. In particular, with the relative rate of nonsynonymous substitutions to synonymous substitutions being 5.0, most of the positively selected codon sites were correctly detected when the total branch length in the phylogenetic tree was > or = 2.5. When this method was applied to the human leukocyte antigen (HLA) gene, which included antigen recognition sites (ARSs), positive selection was detected mainly on ARSs. This finding confirmed the effectiveness of the present method with actual data. Moreover, two amino acid sites were newly identified as positively selected in non-ARSs. The three-dimensional structure of the HLA molecule indicated that these sites might be involved in antigen recognition. Positively selected amino acid sites were also identified in the envelope protein of human immunodeficiency virus and the influenza virus hemagglutinin protein. This method may be helpful for predicting functions of amino acid sites in proteins, especially in the present situation, in which sequence data are accumulating at an enormous speed.  相似文献   

6.
Current sitewise methods for detecting positive selection on gene sequences (the de facto standard being the CODEML method (Yang et al., 2000)) assume no recombination. This paper presents simulation results indicating that violation of this assumption can lead to false positive detection of sites undergoing positive selection. Through the use of population-scaled mutation and recombination rates, simulations can be performed that permit the generation of appropriate null distributions corresponding to neutral expectations in the presence of recombination, thereby allowing for a more accurate estimation of positive selection.  相似文献   

7.
An excess of nonsynonymous substitutions over synonymous ones is an important indicator of positive selection at the molecular level. A lineage that underwent Darwinian selection may have a nonsynonymous/synonymous rate ratio (dN/dS) that is different from those of other lineages or greater than one. In this paper, several codon-based likelihood models that allow for variable dN/dS ratios among lineages were developed. They were then used to construct likelihood ratio tests to examine whether the dN/dS ratio is variable among evolutionary lineages, whether the ratio for a few lineages of interest is different from the background ratio for other lineages in the phylogeny, and whether the dN/dS ratio for the lineages of interest is greater than one. The tests were applied to the lysozyme genes of 24 primate species. The dN/dS ratios were found to differ significantly among lineages, indicating that the evolution of primate lysozymes is episodic, which is incompatible with the neutral theory. Maximum- likelihood estimates of parameters suggested that about nine nonsynonymous and zero synonymous nucleotide substitutions occurred in the lineage leading to hominoids, and the dN/dS ratio for that lineage is significantly greater than one. The corresponding estimates for the lineage ancestral to colobine monkeys were nine and one, and the dN/dS ratio for the lineage is not significantly greater than one, although it is significantly higher than the background ratio. The likelihood analysis thus confirmed most, but not all, conclusions Messier and Stewart reached using reconstructed ancestral sequences to estimate synonymous and nonsynonymous rates for different lineages.   相似文献   

8.
Cattle in Africa are a genetically diverse population that has resulted from successive introduction of Asian Bos indicus and European B. taurus cattle. However, analysis of mitochondrial genetic diversity in African cattle identified three lineages, one associated with Asian B. indicus, one with European B. taurus, and a third ascribed to an indigenous African sub-species of cattle. Due to their extended coevolution, indigenous African herbivores are generally tolerant to endemic African pathogens. We are interested in identifying alleles derived from the indigenous African cattle that may be associated with tolerance to African pathogens. An analysis of the locus which encodes the abundant plasma membrane-associated tyrosine phosphatase, CD45, identified three highly divergent allelic families in Kenya Boran cattle. Analysis of allelic distribution in a diverse range of cattle populations suggests a European B. taurus, an Asian B. indicus, and an African origin. This demonstrates not only significant allelic polymorphism at the CD45 locus in cattle but also convincing autosomal evidence for a distinct African sub-species of cattle. Furthermore, maximum-likelihood analysis of selection pressures revealed that the CD45 locus is subject to exceptionally strong natural selection which we suggest may be pathogen driven.  相似文献   

9.
The standard methods for computing the number of nonsynonymous substitutions (Ka) lump all amino acid changes into one single class, even though their rates of substitution vary by at least 10-fold (Tang et al., 2004). Classifying these changes by their physicochemical properties has not been suitably effective in isolating the fastest evolving classes of changes. We now propose to use the Universal index U of Tang et al. (2004) to classify the 75 elementary amino acid changes (codons differing by 1 bp) by their evolutionary exchangeability. Let Ki denote the Ka value of each class (i = 1, ..., 75 from the most to the least exchangeable). The cumulative Ki for the top 10 classes, denoted Kh (for high-exchangeability types), has two important properties: (1) Kh usually accounts for 25%-30% of total amino acid changes and (2) when the observed number of amino acid substitutions is large, Kh is predictably twice the value of Ka. This shall be referred to as the twofold approximation. The new method for estimating Kh is applied to the comparisons between human and macaque and between mouse and rat. The twofold approximation holds well in these data sets, and the signature of positive selection can be more easily discerned using the Kh statistic than using Ka. Many genes with Ka/Ks > 0.5 can now be shown to have Kh/Ks > 1 and to have evolved adaptively, at least for the high-exchangeability group of amino acid changes.  相似文献   

10.
Screening techniques for detecting allelic variation in DNA sequences   总被引:11,自引:0,他引:11  
This article reviews four 'DNA screening techniques', namely heteroduplex analysis, single-strand conformational polymorphism (SSCP), denaturing gradient gel electrophoresis (DGGE) and temperature gradient gel electrophoresis (TGGE) as tools for the study of allelic variation in natural populations. The resolving power, advantages, and limitations of each technique are discussed and compared. We also provide some criteria for choosing among techniques and illustrate some practical issues with examples taken primarily from our own laboratory experience.  相似文献   

11.
Creevey CJ  McInerney JO 《Gene》2002,300(1-2):43-51
Positive selection or adaptive evolution is thought to be responsible, at least some of the time, for the rapid accumulation of advantageous changes in protein-coding genes. The origin of new enzymatic functions, erection of barriers to heterospecific fertilization, and evasion of host response by pathogens, among other things, are thought to be instances of adaptive evolution. Detecting positive selection in protein-coding genes is fraught with difficulties. Saturation for sequence change, codon usage bias, ephemeral selection events and differential selective pressures on amino acids all contribute to the problem. A number of solutions have been proposed with varying degrees of success, however they suffer from limitations of not being accurate enough or being prohibitively computationally intensive. We have developed a character-based method of identifying lineages that undergo positive selection. In our method we assess the possibility that for each internal branch of a phylogenetic tree an event occurred that subsequently gave rise to a greater number of replacement substitutions than might be expected. We classify these replacement substitutions into two categories – whether they subsequently became invariable or changed again in at least one descendent lineage. The former situation indicates that the new character state is under strong selection to preserve its new identity (directional selection), while the latter situation indicates that there is a persistent pressure to change identity (non-directional selection). The method is fast and accurate, easy to implement, sensitive to short-lived selection events and robust with respect to sampling density and proportion of sites under the influence of positive selection.  相似文献   

12.
Detecting positive Darwinian selection at the DNA sequence level has been a subject of considerable interest. However, positive selection is difficult to detect because it often operates episodically on a few amino acid sites, and the signal may be masked by negative selection. Several methods have been developed to test positive selection that acts on given branches (branch methods) or on a subset of sites (site methods). Recently, Yang, Z., and R. Nielsen (2002. Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol. Biol. Evol. 19:908-917) developed likelihood ratio tests (LRTs) based on branch-site models to detect positive selection that affects a small number of sites along prespecified lineages. However, computer simulations suggested that the tests were sensitive to the model assumptions and were unable to distinguish between relaxation of selective constraint and positive selection (Zhang, J. 2004. Frequent false detection of positive selection by the likelihood method with branch-site models. Mol. Biol. Evol. 21:1332-1339). Here, we describe a modified branch-site model and use it to construct two LRTs, called branch-site tests 1 and 2. We applied the new tests to reanalyze several real data sets and used computer simulation to examine the performance of the two tests by examining their false-positive rate, power, and robustness. We found that test 1 was unable to distinguish relaxed constraint from positive selection affecting the lineages of interest, while test 2 had acceptable false-positive rates and appeared robust against violations of model assumptions. As test 2 is a direct test of positive selection on the lineages of interest, it is referred to as the branch-site test of positive selection and is recommended for use in real data analysis. The test appeared conservative overall, but exhibited better power in detecting positive selection than the branch-based test. Bayes empirical Bayes identification of amino acid sites under positive selection along the foreground branches was found to be reliable, but lacked power.  相似文献   

13.
Li H  Stephan W 《Genetics》2005,171(1):377-384
Two maximum-likelihood methods are proposed for detecting recent, strongly positive selection and for localizing the target of selection along a recombining chromosome. The methods utilize the compact mutation frequency spectrum at multiple neutral loci that are partially linked to the selected site. Using simulated data, we show that the power of the tests lies between 80 and 98% in most cases, and the false positive rate could be as low as approximately 10% when the number of sampled marker loci is sufficiently large (> or = 20). The confidence interval around the estimated position of selection is reasonably narrow. The methods are applied to X chromosome data of Drosophila melanogaster from a European and an African population. Evidence of selection was found for both populations (including a selective sweep that was shared between both populations).  相似文献   

14.
The reliabilities of parsimony-based and likelihood-based methods for inferring positive selection at single amino acid sites were studied using the nucleotide sequences of human leukocyte antigen (HLA) genes, in which positive selection is known to be operating at the antigen recognition site. The results indicate that the inference by parsimony-based methods is robust to the use of different evolutionary models and generally more reliable than that by likelihood-based methods. In contrast, the results obtained by likelihood-based methods depend on the models and on the initial parameter values used. It is sometimes difficult to obtain the maximum likelihood estimates of parameters for a given model, and the results obtained may be false negatives or false positives depending on the initial parameter values. It is therefore preferable to use parsimony-based methods as long as the number of sequences is relatively large and the branch lengths of the phylogenetic tree are relatively small.  相似文献   

15.

Background

One method of identifying cis regulatory differences is to analyze allele-specific expression (ASE) and identify cases of allelic imbalance (AI). RNA-seq is the most common way to measure ASE and a binomial test is often applied to determine statistical significance of AI. This implicitly assumes that there is no bias in estimation of AI. However, bias has been found to result from multiple factors including: genome ambiguity, reference quality, the mapping algorithm, and biases in the sequencing process. Two alternative approaches have been developed to handle bias: adjusting for bias using a statistical model and filtering regions of the genome suspected of harboring bias. Existing statistical models which account for bias rely on information from DNA controls, which can be cost prohibitive for large intraspecific studies. In contrast, data filtering is inexpensive and straightforward, but necessarily involves sacrificing a portion of the data.

Results

Here we propose a flexible Bayesian model for analysis of AI, which accounts for bias and can be implemented without DNA controls. In lieu of DNA controls, this Poisson-Gamma (PG) model uses an estimate of bias from simulations. The proposed model always has a lower type I error rate compared to the binomial test. Consistent with prior studies, bias dramatically affects the type I error rate. All of the tested models are sensitive to misspecification of bias. The closer the estimate of bias is to the true underlying bias, the lower the type I error rate. Correct estimates of bias result in a level alpha test.

Conclusions

To improve the assessment of AI, some forms of systematic error (e.g., map bias) can be identified using simulation. The resulting estimates of bias can be used to correct for bias in the PG model, without data filtering. Other sources of bias (e.g., unidentified variant calls) can be easily captured by DNA controls, but are missed by common filtering approaches. Consequently, as variant identification improves, the need for DNA controls will be reduced. Filtering does not significantly improve performance and is not recommended, as information is sacrificed without a measurable gain. The PG model developed here performs well when bias is known, or slightly misspecified. The model is flexible and can accommodate differences in experimental design and bias estimation.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-920) contains supplementary material, which is available to authorized users.  相似文献   

16.
Anisimova M  Nielsen R  Yang Z 《Genetics》2003,164(3):1229-1236
Maximum-likelihood methods based on models of codon substitution accounting for heterogeneous selective pressures across sites have proved to be powerful in detecting positive selection in protein-coding DNA sequences. Those methods are phylogeny based and do not account for the effects of recombination. When recombination occurs, such as in population data, no unique tree topology can describe the evolutionary history of the whole sequence. This violation of assumptions raises serious concerns about the likelihood method for detecting positive selection. Here we use computer simulation to evaluate the reliability of the likelihood-ratio test (LRT) for positive selection in the presence of recombination. We examine three tests based on different models of variable selective pressures among sites. Sequences are simulated using a coalescent model with recombination and analyzed using codon-based likelihood models ignoring recombination. We find that the LRT is robust to low levels of recombination (with fewer than three recombination events in the history of a sample of 10 sequences). However, at higher levels of recombination, the type I error rate can be as high as 90%, especially when the null model in the LRT is unrealistic, and the test often mistakes recombination as evidence for positive selection. The test that compares the more realistic models M7 (beta) against M8 (beta and omega) is more robust to recombination, where the null model M7 allows the positive selection pressure to vary between 0 and 1 (and so does not account for positive selection), and the alternative model M8 allows an additional discrete class with omega = d(N)/d(S) that could be estimated to be >1 (and thus accounts for positive selection). Identification of sites under positive selection by the empirical Bayes method appears to be less affected than the LRT by recombination.  相似文献   

17.
MOTIVATION: The identification of signatures of positive selection can provide important insights into recent evolutionary history in human populations. Current methods mostly rely on allele frequency determination or focus on one or a small number of candidate chromosomal regions per study. With the availability of large-scale genotype data, efficient approaches for an unbiased whole genome scan are becoming necessary. METHODS: We have developed a new method, the whole genome long-range haplotype test (WGLRH), which uses genome-wide distributions to test for recent positive selection. Adapted from the long-range haplotype (LRH) test, the WGLRH test uses patterns of linkage disequilibrium (LD) to identify regions with extremely low historic recombination. Common haplotypes with significantly longer than expected ranges of LD given their frequencies are identified as putative signatures of recent positive selection. In addition, we have also determined the ancestral alleles of SNPs by genotyping chimpanzee and gorilla DNA, and have identified SNPs where the non-ancestral alleles have risen to extremely high frequencies in human populations, termed 'flipped SNPs'. Combining the haplotype test and the flipped SNPs determination, the WGLRH test serves as an unbiased genome-wide screen for regions under putative selection, and is potentially applicable to the study of other human populations. RESULTS: Using WGLRH and high-density oligonucleotide arrays interrogating 116 204 SNPs, we rapidly identified putative regions of positive selection in three populations (Asian, Caucasian, African-American), and extended these observations to a fourth population, Yoruba, with data obtained from the International HapMap consortium. We mapped significant regions to annotated genes. While some regions overlap with genes previously suggested to be under positive selection, many of the genes have not been previously implicated in natural selection and offer intriguing possibilities for further study. AVAILABILITY: the programs for the WGLRH algorithm are freely available and can be downloaded at http://www.affymetrix.com/support/supplement/WGLRH_program.zip.  相似文献   

18.
The recent availability of genome-scale genotyping data has led to the identification of regions of the human genome that seem to have been targeted by selection. These findings have increased our understanding of the evolutionary forces that affect the human genome, have augmented our knowledge of gene function and promise to increase our understanding of the genetic basis of disease. However, inferences of selection are challenged by several confounding factors, especially the complex demographic history of human populations, and concordance between studies is variable. Although such studies will always be associated with some uncertainty, steps can be taken to minimize the effects of confounding factors and improve our interpretation of their findings.  相似文献   

19.
JCoDA: a tool for detecting evolutionary selection   总被引:1,自引:0,他引:1  

Background  

The incorporation of annotated sequence information from multiple related species in commonly used databases (Ensembl, Flybase, Saccharomyces Genome Database, Wormbase, etc.) has increased dramatically over the last few years. This influx of information has provided a considerable amount of raw material for evaluation of evolutionary relationships. To aid in the process, we have developed JCoDA (Java Codon Delimited Alignment) as a simple-to-use visualization tool for the detection of site specific and regional positive/negative evolutionary selection amongst homologous coding sequences.  相似文献   

20.
Approaches for identifying targets of positive selection   总被引:2,自引:0,他引:2  
Despite significant advancements in both empirical and theoretical population genetics throughout the past century, fundamental questions about the evolutionary forces that shape genomic diversity remain unresolved. Perhaps foremost among these are the strength and frequency of adaptive evolution. To quantify these parameters, statistical tools are needed that are capable of effectively identifying targets of positive selection throughout the genome in an unbiased manner, and functional approaches are needed that are capable of connecting these identified genotypes with the resulting adaptively significant phenotypes. Here we review recent advancements in both statistical and empirical methodology, and discuss important challenges and opportunities that remain as researchers continue to uncouple the relative importance of stochastic and deterministic factors in the evolution of natural populations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号