首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Variation in mitochondrial DNA is often assumed to be neutral and is used to construct the genealogical relationships among populations and species. However, if extant variation is the result of episodes of positive selection, these genealogies may be incorrect, although this information itself may provide biologically and evolutionary meaningful information. In fact, positive Darwinian selection has been detected in the mitochondrial‐encoded subunits that comprise complex I from diverse taxa with seemingly dissimilar bioenergetic life histories, but the functional implications of the selected sites are unknown. Complex I produces roughly 40% of the proton flux that is used to synthesize ATP from ADP, and a functional model based on the high‐resolution structure of complex I described a unique biomechanical apparatus for proton translocation. We reported positive selection at sites in this apparatus during the evolution of Pacific salmon, and it appeared this was also the case in published reports from other taxa, but a comparison among studies was difficult because different statistical tests were used to detect selection and oftentimes, specific sites were not reported. Here we review the literature of positive selection in mitochondrial genomes, the statistical tests used to detect selection, and the structural and functional models that are currently available to study the physiological implications of selection. We then search for signatures of positive selection among the coding mitochondrial genomes of 237 species with a common set of tests and verify that the ND5 subunit of complex I is a repeated target of positive Darwinian selection in diverse taxa. We propose a novel hypothesis to explain the results based on their bioenergetic life histories and provide a guide for laboratory and field studies to test this hypothesis.  相似文献   

2.
The gap between the number of known protein sequences and structures continues to widen, particularly as a result of sequencing projects for entire genomes. Recently there have been many attempts to generate structural assignments to all genes on sets of completed genomes using fold-recognition methods. We developed a method that detects false positives made by these genome-wide structural assignment experiments by identifying isolated occurrences. The method was tested using two sets of assignments, generated by SUPERFAMILY and PSI-BLAST, on 150 completed genomes. A phylogeny of these genomes was built and a parsimony algorithm was used to identify isolated occurrences by detecting occurrences that cause a gain at leaf level. Isolated occurrences tend to have high e-values, and in both sets of assignments, a sudden increase in isolated occurrences is observed for e-values >10−8 for SUPERFAMILY and >10−4 for PSI-BLAST. Conditions to predict false positives are based on these results. Independent tests confirm that the predicted false positives are indeed more likely to be incorrectly assigned. Evaluation of the predicted false positives also showed that the accuracy of profile-based fold-recognition methods might depend on secondary structure content and sequence length. We show that false positives generated by fold-recognition methods can be identified by considering structural occurrence patterns on completed genomes; occurrences that are isolated within the phylogeny tend to be less reliable. The method provides a new independent way to examine the quality of fold assignments and may be used to improve the output of any genome-wide fold assignment method.  相似文献   

3.
The identification of genes influencing fitness is central to our understanding of the genetic basis of adaptation and how it shapes phenotypic variation in wild populations. Here, we used whole‐genome resequencing of wild Rocky Mountain bighorn sheep (Ovis canadensis) to >50‐fold coverage to identify 2.8 million single nucleotide polymorphisms (SNPs) and genomic regions bearing signatures of directional selection (i.e. selective sweeps). A comparison of SNP diversity between the X chromosome and the autosomes indicated that bighorn males had a dramatically reduced long‐term effective population size compared to females. This probably reflects a long history of intense sexual selection mediated by male–male competition for mates. Selective sweep scans based on heterozygosity and nucleotide diversity revealed evidence for a selective sweep shared across multiple populations at RXFP2, a gene that strongly affects horn size in domestic ungulates. The massive horns carried by bighorn rams appear to have evolved in part via strong positive selection at RXFP2. We identified evidence for selection within individual populations at genes affecting early body growth and cellular response to hypoxia; however, these must be interpreted more cautiously as genetic drift is strong within local populations and may have caused false positives. These results represent a rare example of strong genomic signatures of selection identified at genes with known function in wild populations of a nonmodel species. Our results also showcase the value of reference genome assemblies from agricultural or model species for studies of the genomic basis of adaptation in closely related wild taxa.  相似文献   

4.
Identifying adaptive loci can provide insight into the mechanisms underlying local adaptation. Genotype–environment association (GEA) methods, which identify these loci based on correlations between genetic and environmental data, are particularly promising. Univariate methods have dominated GEA, despite the high dimensional nature of genotype and environment. Multivariate methods, which analyse many loci simultaneously, may be better suited to these data as they consider how sets of markers covary in response to environment. These methods may also be more effective at detecting adaptive processes that result in weak, multilocus signatures. Here, we evaluate four multivariate methods and five univariate and differentiation‐based approaches, using published simulations of multilocus selection. We found that Random Forest performed poorly for GEA. Univariate GEAs performed better, but had low detection rates for loci under weak selection. Constrained ordinations, particularly redundancy analysis (RDA), showed a superior combination of low false‐positive and high true‐positive rates across all levels of selection. These results were robust across the demographic histories, sampling designs, sample sizes and weak population structure tested here. The value of combining detections from different methods was variable and depended on the study goals and knowledge of the drivers of selection. Re‐analysis of genomic data from grey wolves highlighted the unique, covarying sets of adaptive loci that could be identified using RDA. Although additional testing is needed, this study indicates that RDA is an effective means of detecting adaptation, including signatures of weak, multilocus selection, providing a powerful tool for investigating the genetic basis of local adaptation.  相似文献   

5.
Understanding factors regulating hybrid fitness and gene exchange is a major research challenge for evolutionary biology. Genomic cline analysis has been used to evaluate alternative patterns of introgression, but only two models have been used widely and the approach has generally lacked a hypothesis testing framework for distinguishing effects of selection and drift. I propose two alternative cline models, implement multivariate outlier detection to identify markers associated with hybrid fitness, and simulate hybrid zone dynamics to evaluate the signatures of different modes of selection. Analysis of simulated data shows that previous approaches are prone to false positives (multinomial regression) or relatively insensitive to outlier loci affected by selection (Barton's concordance). The new, theory‐based logit‐logistic cline model is generally best at detecting loci affecting hybrid fitness. Although some generalizations can be made about different modes of selection, there is no one‐to‐one correspondence between pattern and process. These new methods will enhance our ability to extract important information about the genetics of reproductive isolation and hybrid fitness. However, much remains to be done to relate statistical patterns to particular evolutionary processes. The methods described here are implemented in a freely available package “HIest” for the R statistical software (CRAN; http://cran.r-project.org/ ).  相似文献   

6.
Urban areas are increasing globally, providing opportunities for biodiversity researchers to study the process in which species become established in novel, highly disturbed habitats. This ecological process can be understood through analyses of morphological and genetic variation, which can shed light on patterns of neutral and adaptive evolution. Previous studies have shown that urban populations often diverge genetically from non-urban source populations. This could occur due to neutral genetic drift, but an alternative is that selection could lead to allele frequency changes in urban populations. The development of genome scan methods provides an opportunity to investigate these outcomes from samples of genetic variation taken along an urbanization gradient. Here we examine morphological variation in wing size and diversity at neutral amplified fragment length polymorphisms in the butterfly Pieris rapae L. (Lepidoptera, Pieridae) sampled from the center to the periphery of Marseille. We utilize established and novel environmental correlation approaches to scan genetic variation for evidence of selection. We find significant morphological differences in urban populations, as well as weak genetic structure and decreased genetic diversity in urban versus non-urban sites. However, environmental correlation tests provide little support for selection in our dataset. Our comparison of different methods and allele frequency clines suggests that loci identified as significant are false positives. Although there is some indication that selection may be acting on wing size in urban butterflies, genetic analyses suggest P. rapae are undergoing neutral drift.  相似文献   

7.
Major histocompatibility complex (MHC) genes encode proteins that play a central role in vertebrates' adaptive immunity to parasites. MHC loci are among the most polymorphic in vertebrates' genomes, inspiring many studies to identify evolutionary processes driving MHC polymorphism within populations and divergence between populations. Leading hypotheses include balancing selection favouring rare alleles within populations, and spatially divergent selection. These hypotheses do not always produce diagnosably distinct predictions, causing many studies of MHC to yield inconsistent or ambiguous results. We suggest a novel strategy to distinguish balancing vs. divergent selection on MHC, taking advantage of natural admixture between parapatric populations. With divergent selection, individuals with immigrant alleles will be more infected and less fit because they are susceptible to novel parasites in their new habitat. With balancing selection, individuals with locally rare immigrant alleles will be more fit (less infected). We tested these contrasting predictions using three‐spine stickleback from three replicate pairs of parapatric lake and stream habitats. We found numerous positive and negative associations between particular MHC IIβ alleles and particular parasite taxa. A few allele–parasite comparisons supported balancing selection, and others supported divergent selection between habitats. But, there was no overall tendency for fish with immigrant MHC alleles to be more or less heavily infected. Instead, locally rare MHC alleles (not necessarily immigrants) were associated with heavier infections. Our results illustrate the complex relationship between MHC IIβ allelic variation and spatially varying multispecies parasite communities: different hypotheses may be concurrently true for different allele–parasite combinations.  相似文献   

8.
The Chinese Meishan pig breed is well known for its high prolificacy. Moreover, this breed can be divided into three types based on their body size: big Meishan, middle Meishan (MMS) and small Meishan (SMS) pigs. Few studies have reported on the genetic signatures of Meishan pigs, particularly on a genome‐wide scale. Exploring for genetic signatures could be quite valuable for revealing the genetic architecture of phenotypic variation. Thus, we performed research in two parts based on the genome reducing and sequencing data of 143 Meishan pigs (74 MMS pigs, 69 SMS pigs). First, we detected the selection signatures among all Meishan pigs studied using the relative extended haplotype homozygosity test. Second, we detected the selection signatures between MMS and SMS pigs using the cross‐population extended haplotype homozygosity and FST methods. A total of 111 398 SNPs were identified from the sequenced genomes. In the population analysis, the most significant genes were associated with the mental development (RGMA), reproduction (HDAC4, FOXL2) and lipid metabolism (ACACB). From the cross‐population analysis, we detected genes related to body weight (SPDEF, PACSIN1) in both methods. We suggest that rs341373351, located within the PACSIN1 gene, might be the causal variant. This study may have achieved consistency between selection signatures and characteristics within and between Meishan pig populations. These findings can provide insight into investigating the molecular background of high prolificacy and body size in pig.  相似文献   

9.
Mutations are the primary source of all genetic variation. Knowledge about their rates is critical for any evolutionary genetic analyses, but for a long time, that knowledge has remained elusive and indirectly inferred. In recent years, parent–offspring comparisons have yielded the first direct mutation rate estimates. The analyses are, however, challenging due to high rate of false positives and no consensus regarding standardized filtering of candidate de novo mutations. Here, we validate the application of a machine learning approach for such a task and estimate the mutation rate for the guppy (Poecilia reticulata), a model species in eco-evolutionary studies. We sequenced 4 parents and 20 offspring, followed by screening their genomes for de novo mutations. The initial large number of candidate de novo mutations was hard-filtered to remove false-positive results. These results were compared with mutation rate estimated with a supervised machine learning approach. Both approaches were followed by molecular validation of all candidate de novo mutations and yielded similar results. The ML method uniquely identified three mutations, but overall required more hands-on curation and had higher rates of false positives and false negatives. Both methods concordantly showed no difference in mutation rates between families. Estimated here the guppy mutation rate is among the lowest directly estimated mutation rates in vertebrates; however, previous research has also found low estimated rates in other teleost fishes. We discuss potential explanations for such a pattern, as well as future utility and limitations of machine learning approaches.  相似文献   

10.
Accurate and comprehensive identification of surface‐exposed proteins (SEPs) in parasites is a key step in developing novel subunit vaccines. However, the reliability of MS‐based high‐throughput methods for proteome‐wide mapping of SEPs continues to be limited due to high rates of false positives (i.e., proteins mistakenly identified as surface exposed) as well as false negatives (i.e., SEPs not detected due to low expression or other technical limitations). We propose a framework called PlasmoSEP for the reliable identification of SEPs using a novel semisupervised learning algorithm that combines SEPs identified by high‐throughput experiments and expert annotation of high‐throughput data to augment labeled data for training a predictive model. Our experiments using high‐throughput data from the Plasmodium falciparum surface‐exposed proteome provide several novel high‐confidence predictions of SEPs in P. falciparum and also confirm expert annotations for several others. Furthermore, PlasmoSEP predicts that 25 of 37 experimentally identified SEPs in Plasmodium yoelii salivary gland sporozoites are likely to be SEPs. Finally, PlasmoSEP predicts several novel SEPs in P. yoelii and Plasmodium vivax malaria parasites that can be validated for further vaccine studies. Our computational framework can be easily adapted to improve the interpretation of data from high‐throughput studies.  相似文献   

11.
We consider three approaches for estimating the rates of nonsynonymous and synonymous changes at each site in a sequence alignment in order to identify sites under positive or negative selection: (1) a suite of fast likelihood-based "counting methods" that employ either a single most likely ancestral reconstruction, weighting across all possible ancestral reconstructions, or sampling from ancestral reconstructions; (2) a random effects likelihood (REL) approach, which models variation in nonsynonymous and synonymous rates across sites according to a predefined distribution, with the selection pressure at an individual site inferred using an empirical Bayes approach; and (3) a fixed effects likelihood (FEL) method that directly estimates nonsynonymous and synonymous substitution rates at each site. All three methods incorporate flexible models of nucleotide substitution bias and variation in both nonsynonymous and synonymous substitution rates across sites, facilitating the comparison between the methods. We demonstrate that the results obtained using these approaches show broad agreement in levels of Type I and Type II error and in estimates of substitution rates. Counting methods are well suited for large alignments, for which there is high power to detect positive and negative selection, but appear to underestimate the substitution rate. A REL approach, which is more computationally intensive than counting methods, has higher power than counting methods to detect selection in data sets of intermediate size but may suffer from higher rates of false positives for small data sets. A FEL approach appears to capture the pattern of rate variation better than counting methods or random effects models, does not suffer from as many false positives as random effects models for data sets comprising few sequences, and can be efficiently parallelized. Our results suggest that previously reported differences between results obtained by counting methods and random effects models arise due to a combination of the conservative nature of counting-based methods, the failure of current random effects models to allow for variation in synonymous substitution rates, and the naive application of random effects models to extremely sparse data sets. We demonstrate our methods on sequence data from the human immunodeficiency virus type 1 env and pol genes and simulated alignments.  相似文献   

12.
Coevolution between hosts and their parasites is expected to follow a range of possible dynamics, the two extreme cases being called trench warfare (or Red Queen) and arms races. Long‐term stable polymorphism at the host and parasite coevolving loci is characteristic of trench warfare, and is expected to promote molecular signatures of balancing selection, while the recurrent allele fixation in arms races should generate selective sweeps. We compare these two scenarios using a finite size haploid gene‐for‐gene model that includes both mutation and genetic drift. We first show that trench warfare do not necessarily display larger numbers of coevolutionary cycles per unit of time than arms races. We subsequently perform coalescent simulations under these dynamics to generate sequences at both host and parasite loci. Genomic footprints of recurrent selective sweeps are often found, whereas trench warfare yield signatures of balancing selection only in parasite sequences, and only in a limited parameter space. Our results suggest that deterministic models of coevolution with infinite population sizes do not predict reliably the observed genomic signatures, and it may be best to study parasite rather than host populations to find genomic signatures of coevolution, such as selective sweeps or balancing selection.  相似文献   

13.
There is currently large interest in distinguishing the signatures of genetic variation produced by demographic events from those produced by natural selection. We propose a simple multilocus statistical test to identify candidate sites of selective sweeps with high power. The test is based on the variability profile measured in an array of linked microsatellites. We also show that the analysis of flanking markers drastically reduces the number of false positives among the candidates that are identified in a genomewide survey of unlinked loci and find that this property is maintained in many population-bottleneck scenarios. However, for a certain range of intermediately severe population bottlenecks we find genomic signatures that are very similar to those produced by a selective sweep. While in these worst-case scenarios the power of the proposed test remains high, the false-positive rate reaches values close to 50%. Hence, selective sweeps may be hard to identify even if multiple linked loci are analyzed. Nevertheless, the integration of information from multiple linked loci always leads to a considerable reduction of the false-positive rate compared to a genome scan of unlinked loci. We discuss the application of this test to experimental data from Drosophila melanogaster.  相似文献   

14.
There is increasing interest in studying the molecular mechanisms of recent adaptations caused by positive selection in the genomics era. Such endeavors to detect recent positive selection, however, have been severely handicapped by false positives due to the confounding impact of demography and the population structure. To reduce false positives, it is critical to conduct a functional analysis to identify the true candidate genes/mutations from those that are filtered through neutrality tests. However, the extremely high cost of such functional analysis may restrict studies within a small number of model species. In particular, when the false positive rate of neutrality tests is high, the efficiency of the functional analysis will also be very low. Therefore, although the recent improvements have been made in the (joint) inference of demography and selection, our ultimate goal, which is to understand the mechanism of adaptation generally in a wide variety of natural populations, may not be achieved using the currently available approaches. More attention should thus be spent on the development of more reliable tests that could not only free themselves from the confounding impact of demography and the population structure but also have reasonable power to detect selection.  相似文献   

15.
FST outlier tests are a potentially powerful way to detect genetic loci under spatially divergent selection. Unfortunately, the extent to which these tests are robust to nonequilibrium demographic histories has been understudied. We developed a landscape genetics simulator to test the effects of isolation by distance (IBD) and range expansion on FST outlier methods. We evaluated the two most commonly used methods for the identification of FST outliers (FDIST2 and BayeScan, which assume samples are evolutionarily independent) and two recent methods (FLK and Bayenv2, which estimate and account for evolutionary nonindependence). Parameterization with a set of neutral loci (‘neutral parameterization’) always improved the performance of FLK and Bayenv2, while neutral parameterization caused FDIST2 to actually perform worse in the cases of IBD or range expansion. BayeScan was improved when the prior odds on neutrality was increased, regardless of the true odds in the data. On their best performance, however, the widely used methods had high false‐positive rates for IBD and range expansion and were outperformed by methods that accounted for evolutionary nonindependence. In addition, default settings in FDIST2 and BayeScan resulted in many false positives suggesting balancing selection. However, all methods did very well if a large set of neutral loci is available to create empirical P‐values. We conclude that in species that exhibit IBD or have undergone range expansion, many of the published FST outliers based on FDIST2 and BayeScan are probably false positives, but FLK and Bayenv2 show great promise for accurately identifying loci under spatially divergent selection.  相似文献   

16.
Balancing selection can maintain immunogenetic variation within host populations, but detecting its signal in a postbottlenecked population is challenging due to the potentially overriding effects of drift. Toll‐like receptor genes (TLRs) play a fundamental role in vertebrate immune defence and are predicted to be under balancing selection. We previously characterized variation at TLR loci in the Seychelles warbler (Acrocephalus sechellensis), an endemic passerine that has undergone a historical bottleneck. Five of seven TLR loci were polymorphic, which is in sharp contrast to the low genomewide variation observed. However, standard population genetic statistical methods failed to detect a contemporary signature of selection at any TLR locus. We examined whether the observed TLR polymorphism could be explained by neutral evolution, simulating the population's demography in the software DIYABC. This showed that the posterior distributions of mutation rates had to be unrealistically high to explain the observed genetic variation. We then conducted simulations with an agent‐based model using typical values for the mutation rate, which indicated that weak balancing selection has acted on the three TLR genes. The model was able to detect evidence of past selection elevating TLR polymorphism in the prebottleneck populations, but was unable to discern any effects of balancing selection in the contemporary population. Our results show drift is the overriding evolutionary force that has shaped TLR variation in the contemporary Seychelles warbler population, and the observed TLR polymorphisms might be merely the ‘ghost of selection past’. Forecast models predict immunogenetic variation in this species will continue to be eroded in the absence of contemporary balancing selection. Such ‘drift debt’ occurs when a gene pool has not yet reached its new equilibrium level of polymorphism, and this loss could be an important threat to many recently bottlenecked populations.  相似文献   

17.
Environmental DNA (eDNA) sampling is prone to both false‐positive and false‐negative errors. We review statistical methods to account for such errors in the analysis of eDNA data and use simulations to compare the performance of different modelling approaches. Our simulations illustrate that even low false‐positive rates can produce biased estimates of occupancy and detectability. We further show that removing or classifying single PCR detections in an ad hoc manner under the suspicion that such records represent false positives, as sometimes advocated in the eDNA literature, also results in biased estimation of occupancy, detectability and false‐positive rates. We advocate alternative approaches to account for false‐positive errors that rely on prior information, or the collection of ancillary detection data at a subset of sites using a sampling method that is not prone to false‐positive errors. We illustrate the advantages of these approaches over ad hoc classifications of detections and provide practical advice and code for fitting these models in maximum likelihood and Bayesian frameworks. Given the severe bias induced by false‐negative and false‐positive errors, the methods presented here should be more routinely adopted in eDNA studies.  相似文献   

18.
Microsatellite loci are widely used in population genetic studies, but the presence of null alleles may lead to biased results. Here, we assessed five methods that indirectly detect null alleles and found large inconsistencies among them. Our analysis was based on 20 microsatellite loci genotyped in a natural population of Microtus oeconomus sampled during 8 years, together with 1200 simulated populations without null alleles, but experiencing bottlenecks of varying duration and intensity, and 120 simulated populations with known null alleles. In the natural population, 29% of positive results were consistent between the methods in pairwise comparisons, and in the simulated data set, this proportion was 14%. The positive results were also inconsistent between different years in the natural population. In the null‐allele‐free simulated data set, the number of false positives increased with increased bottleneck intensity and duration. We also found a low concordance in null allele detection between the original simulated populations and their 20% random subsets. In the populations simulated to include null alleles, between 22% and 42% of true null alleles remained undetected, which highlighted that detection errors are not restricted to false positives. None of the evaluated methods clearly outperformed the others when both false‐positive and false‐negative rates were considered. Accepting only the positive results consistent between at least two methods should considerably reduce the false‐positive rate, but this approach may increase the false‐negative rate. Our study demonstrates the need for novel null allele detection methods that could be reliably applied to natural populations.  相似文献   

19.
We carried out a simulation study to compare the efficiency of three alternative programs (dfdist , detseld and bayescan ) to detect loci under directional selection from genome‐wide scans using dominant markers. We also evaluated the efficiency of correcting for multiple testing those methods that use a classical probability approach. Under a wide range of scenarios, we conclude that bayescan appears to be more efficient than the other methods, detecting a usually high percentage of true selective loci as well as less than 1% of outliers (false positives) under a fully neutral model. In addition, the percentage of outliers detected by this software is always correlated with the true percentage of selective loci in the genome. Our results show, nevertheless, that false positives are common even with a combination of methods and multitest correction, suggesting that conclusions obtained from this approach should be taken with extreme caution.  相似文献   

20.
MOTIVATION: Experimental limitations in high-throughput protein-protein interaction detection methods have resulted in low quality interaction datasets that contained sizable fractions of false positives and false negatives. Small-scale, focused experiments are then needed to complement the high-throughput methods to extract true protein interactions. However, the naturally vast interactomes would require much more scalable approaches. RESULTS: We describe a novel method called IRAP* as a computational complement for repurification of the highly erroneous experimentally derived protein interactomes. Our method involves an iterative process of removing interactions that are confidently identified as false positives and adding interactions detected as false negatives into the interactomes. Identification of both false positives and false negatives are performed in IRAP* using interaction confidence measures based on network topological metrics. Potential false positives are identified amongst the detected interactions as those with very low computed confidence values, while potential false negatives are discovered as the undetected interactions with high computed confidence values. Our results from applying IRAP* on large-scale interaction datasets generated by the popular yeast-two-hybrid assays for yeast, fruit fly and worm showed that the computationally repurified interaction datasets contained potentially lower fractions of false positive and false negative errors based on functional homogeneity. AVAILABILITY: The confidence indices for PPIs in yeast, fruit fly and worm as computed by our method can be found at our website http://www.comp.nus.edu.sg/~chenjin/fpfn.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号