首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A popular approach to detecting positive selection is to estimate the parameters of a probabilistic model of codon evolution and perform inference based on its maximum likelihood parameter values. This approach has been evaluated intensively in a number of simulation studies and found to be robust when the available data set is large. However, uncertainties in the estimated parameter values can lead to errors in the inference, especially when the data set is small or there is insufficient divergence between the sequences. We introduce a Bayesian model comparison approach to infer whether the sequence as a whole contains sites at which the rate of nonsynonymous substitution is greater than the rate of synonymous substitution. We incorporated this probabilistic model comparison into a Bayesian approach to site-specific inference of positive selection. Using simulated sequences, we compared this approach to the commonly used empirical Bayes approach and investigated the effect of tree length on the performance of both methods. We found that the Bayesian approach outperforms the empirical Bayes method when the amount of sequence divergence is small and is less prone to false-positive inference when the sequences are saturated, while the results are indistinguishable for intermediate levels of sequence divergence.  相似文献   

2.

Background  

In this study we present a single population test (Ewens-Waterson) applied in a genomic context to investigate the presence of recent positive selection in the Irish population. The Irish population is an interesting focus for the investigation of recent selection since several lines of evidence suggest that it may have a relatively undisturbed genetic heritage.  相似文献   

3.
Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.  相似文献   

4.
Based on nearly complete genome sequences from a variety of organisms data on naturally occurring genetic variation on the scale of hundreds of loci to entire genomes have been collected in recent years. In parallel, new statistical tests have been developed to infer evidence of recent positive selection from these data and to localize the target regions of selection in the genome. These methods have now been successfully applied to Drosophila melanogaster , humans, mice and a few plant species. In genomic regions of normal recombination rates, the targets of positive selection have been mapped down to the level of individual genes.  相似文献   

5.
MOTIVATION: Accurate detection of positive Darwinian selection can provide important insights to researchers investigating the evolution of pathogens. However, many pathogens (particularly viruses) undergo frequent recombination and the phylogenetic methods commonly applied to detect positive selection have been shown to give misleading results when applied to recombining sequences. We propose a method that makes maximum likelihood inference of positive selection robust to the presence of recombination. This is achieved by allowing tree topologies and branch lengths to change across detected recombination breakpoints. Further improvements are obtained by allowing synonymous substitution rates to vary across sites. RESULTS: Using simulation we show that, even for extreme cases where recombination causes standard methods to reach false positive rates >90%, the proposed method decreases the false positive rate to acceptable levels while retaining high power. We applied the method to two HIV-1 datasets for which we have previously found that inference of positive selection is invalid owing to high rates of recombination. In one of these (env gene) we still detected positive selection using the proposed method, while in the other (gag gene) we found no significant evidence of positive selection. AVAILABILITY: A HyPhy batch language implementation of the proposed methods and the HIV-1 datasets analysed are available at http://www.cbio.uct.ac.za/pub_support/bioinf06. The HyPhy package is available at http://www.hyphy.org, and it is planned that the proposed methods will be included in the next distribution. RDP2 is available at http://darwin.uvigo.es/rdp/rdp.html  相似文献   

6.
Most of the gene prediction algorithms for prokaryotes are based on Hidden Markov Models or similar machine-learning approaches, which imply the optimization of a high number of parameters. The present paper presents a novel method for the classification of coding and non-coding regions in prokaryotic genomes, based on a suitably defined compression index of a DNA sequence. The main features of this new method are the non-parametric logic and the costruction of a dictionary of words extracted from the sequences. These dictionaries can be very useful to perform further analyses on the genomic sequences themselves. The proposed approach has been applied on some prokaryotic complete genomes, obtaining optimal scores of correctly recognized coding and non-coding regions. Several false-positive and false-negative cases have been investigated in detail, which have revealed that this approach can fail in the presence of highly structured coding regions (e.g., genes coding for modular proteins) or quasi-random non-coding regions (e.g., regions hosting non-functional fragments of copies of functional genes; regions hosting promoters or other protein-binding sequences). We perform an overall comparison with other gene-finder software, since at this step we are not interested in building another gene-finder system, but only in exploring the possibility of the suggested approach.  相似文献   

7.
8.
Marshall JM  Weiss RE 《Genetics》2006,173(4):2357-2370
The distribution of microsatellite allele sizes in populations aids in understanding the genetic diversity of species and the evolutionary history of recent selective sweeps. We propose a heterogeneous Bayesian analysis of variance model for inferring loci involved in recent selective sweeps by analyzing the distribution of allele sizes at multiple loci in multiple populations. Our model is shown to be consistent with a multilocus test statistic, ln RV, proposed for identifying microsatellite loci involved in recent selective sweeps. Our methodology differs in that it accepts original allele size data rather than summary statistics and allows the incorporation of prior knowledge about allele frequencies using a hierarchical prior distribution consisting of log normal and gamma probability distributions. Interesting features of the model are its ability to simultaneously analyze allele size data for any number of populations and to cope with the presence of any number of selected loci. The utility of the method is illustrated by application to two sets of microsatellite allele size data for a group of West African Anopheles gambiae populations. The results are consistent with the suppressed-recombination model of speciation, and additional candidate loci on chromosomes 2 (079 and 175) and 3 (088) are discovered that escaped former analysis.  相似文献   

9.
Developmental biology often yields data in a temporal context. Temporal data in phylogenetic systematics has important uses in the field of evolutionary developmental biology and, in general, comparative biology. The evolution of temporal sequences, specifically developmental sequences, has proven difficult to examine due to the highly variable temporal progression of development. Issues concerning the analysis of temporal sequences and problems with current methods of analysis are discussed. We present here an algorithm to infer ancestral temporal sequences, quantify sequence heterochronies, and estimate pseudoreplicate consensus support for sequence changes using Parsimov-based genetic inference [PGi]. Real temporal developmental sequence data sets are used to compare PGi with currently used approaches, and PGi is shown to be the most efficient, accurate, and practical method to examine biological data and infer ancestral states on a phylogeny. The method is also expandable to address further issues in developmental evolution, namely modularity.  相似文献   

10.
11.
12.
Abstract  The prioritisation of potential agents on the basis of likely efficacy is an important step in biological control because it can increase the probability of a successful biocontrol program, and reduce risks and costs. In this introductory paper we define success in biological control, review how agent selection has been approached historically, and outline the approach to agent selection that underpins the structure of this special issue on agent selection. Developing criteria by which to judge the success of a biocontrol agent (or program) provides the basis for agent selection decisions. Criteria will depend on the weed, on the ecological and management context in which that weed occurs, and on the negative impacts that biocontrol is seeking to redress. Predicting which potential agents are most likely to be successful poses enormous scientific challenges. 'Rules of thumb', 'scoring systems' and various conceptual and quantitative modelling approaches have been proposed to aid agent selection. However, most attempts have met with limited success due to the diversity and complexity of the systems in question. This special issue presents a series of papers that deconstruct the question of agent choice with the aim of progressively improving the success rate of biological control. Specifically they ask: (i) what potential agents are available and what should we know about them? (ii) what type, timing and degree of damage is required to achieve success? and (iii) which potential agent will reach the necessary density, at the right time, to exert the required damage in the target environment?  相似文献   

13.
Selection mapping applies the population genetics theory of hitchhiking to the localization of genomic regions containing genes under selection. This approach predicts that neutral loci linked to genes under positive selection will have reduced diversity due to their shared history with a selected locus, and thus, genome scans of diversity levels can be used to identify regions containing selected loci. Most previous approaches to this problem ignore the spatial genomic pattern of diversity expected under selection. The regression-based approach advocated in this paper takes into account the expected pattern of decreasing genetic diversity with increased proximity to a selected locus. Simulated data are used to examine the patterns of diversity under different scenarios, in order to assess the power of a regression-based approach to the identification of regions under selection. Application of this method to both simulated and empirical data demonstrates its potential to detect selection. In contrast to some other methods, the regression approach described in this paper can be applied to any marker type. Results also suggest that this approach may give more precise estimates of the location of the selected locus than alternative methods, although the power is slightly lower in some cases.  相似文献   

14.
15.
We present a new method for inferring hidden Markov models from noisy time sequences without the necessity of assuming a model architecture, thus allowing for the detection of degenerate states. This is based on the statistical prediction techniques developed by Crutchfield et al. and generates so called causal state models, equivalent in structure to hidden Markov models. The new method is applicable to any continuous data which clusters around discrete values and exhibits multiple transitions between these values such as tethered particle motion data or Fluorescence Resonance Energy Transfer (FRET) spectra. The algorithms developed have been shown to perform well on simulated data, demonstrating the ability to recover the model used to generate the data under high noise, sparse data conditions and the ability to infer the existence of degenerate states. They have also been applied to new experimental FRET data of Holliday Junction dynamics, extracting the expected two state model and providing values for the transition rates in good agreement with previous results and with results obtained using existing maximum likelihood based methods. The method differs markedly from previous Markov-model reconstructions in being able to uncover truly hidden states.  相似文献   

16.
This paper brings together two themes in evolutionary population genetics theory. The first concerns Fisher's Fundamental Theorem of Natural Selection: a recent interpretation of this theorem claims that it is an exact result, relating to the so-called "partial" increase in mean fitness. The second theme concerns the desire to find an optimality principle in genetic evolution. Such a principle is found here: of all gene frequency changes which lead to the same partial increase in mean fitness as the natural selection gene frequency changes, the natural selection values minimize a generalized distance measure between parent and daughter generation gene frequency values.  相似文献   

17.
We review the various factors that limit adaptation by natural selection. Recent discussion of constraints on selection and, conversely, of the factors that enhance "evolvability", have concentrated on the kinds of variation that can be produced. Here, we emphasise that adaptation depends on how the various evolutionary processes shape variation in populations. We survey the limits that population genetics places on adaptive evolution, and discuss the relationship between disparate literatures.  相似文献   

18.
Abstract

A synthesis is presented of some research results on the structure and dynamics of populations with respect to the phytocoenosis structure. The first aspects to be considered are concerned with the dynamism, distribution and reproductive strategy of Anemone nemorosa: the vegetation structure either favors or inhibits the various reproductive strategies, thereby influencing the distribution pattern of the species. The second area of the present study is the demography and quantitative structure of Cytisus sessilifolius populations with respect to the problematic of secondary successions: the first results obtained open up interesting prospectives on the relationship between the vegetation dynamics and the frutescent populations structure.  相似文献   

19.
Nordborg M  Innan H 《Genetics》2003,163(3):1201-1213
A stochastic model for the genealogy of a sample of recombining sequences containing one or more sites subject to selection in a subdivided population is described. Selection is incorporated by dividing the population into allelic classes and then conditioning on the past sizes of these classes. The past allele frequencies at the selected sites are thus treated as parameters rather than as random variables. The purpose of the model is not to investigate the dynamics of selection, but to investigate effects of linkage to the selected sites on the genealogy of the surrounding chromosomal region. This approach is useful for modeling strong selection, when it is natural to parameterize the past allele frequencies at the selected sites. Several models of strong balancing selection are used as examples, and the effects on the pattern of neutral polymorphism in the chromosomal region are discussed. We focus in particular on the statistical power to detect balancing selection when it is present.  相似文献   

20.
MOTIVATION: Genetic networks are often described statistically using graphical models (e.g. Bayesian networks). However, inferring the network structure offers a serious challenge in microarray analysis where the sample size is small compared to the number of considered genes. This renders many standard algorithms for graphical models inapplicable, and inferring genetic networks an 'ill-posed' inverse problem. METHODS: We introduce a novel framework for small-sample inference of graphical models from gene expression data. Specifically, we focus on the so-called graphical Gaussian models (GGMs) that are now frequently used to describe gene association networks and to detect conditionally dependent genes. Our new approach is based on (1) improved (regularized) small-sample point estimates of partial correlation, (2) an exact test of edge inclusion with adaptive estimation of the degree of freedom and (3) a heuristic network search based on false discovery rate multiple testing. Steps (2) and (3) correspond to an empirical Bayes estimate of the network topology. RESULTS: Using computer simulations, we investigate the sensitivity (power) and specificity (true negative rate) of the proposed framework to estimate GGMs from microarray data. This shows that it is possible to recover the true network topology with high accuracy even for small-sample datasets. Subsequently, we analyze gene expression data from a breast cancer tumor study and illustrate our approach by inferring a corresponding large-scale gene association network for 3883 genes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号