首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: Comparative genomics in general and orthology analysis in particular are becoming increasingly important parts of gene function prediction. Previously, orthology analysis and reconciliation has been performed only with respect to the parsimony model. This discards many plausible solutions and sometimes precludes finding the correct one. In many other areas in bioinformatics probabilistic models have proven to be both more realistic and powerful than parsimony models. For instance, they allow for assessing solution reliability and consideration of alternative solutions in a uniform way. There is also an added benefit in making model assumptions explicit and therefore making model comparisons possible. For orthology analysis, uncertainty has recently been addressed using parsimonious reconciliation combined with bootstrap techniques. However, until now no probabilistic methods have been available. RESULTS: We introduce a probabilistic gene evolution model based on a birth-death process in which a gene tree evolves 'inside' a species tree. Based on this model, we develop a tool with the capacity to perform practical orthology analysis, based on Fitch's original definition, and more generally for reconciling pairs of gene and species trees. Our gene evolution model is biologically sound (Nei et al., 1997) and intuitively attractive. We develop a Bayesian analysis based on MCMC which facilitates approximation of an a posteriori distribution for reconciliations. That is, we can find the most probable reconciliations and estimate the probability of any reconciliation, given the observed gene tree. This also gives a way to estimate the probability that a pair of genes are orthologs. The main algorithmic contribution presented here consists of an algorithm for computing the likelihood of a given reconciliation. To the best of our knowledge, this is the first successful introduction of this type of probabilistic methods, which flourish in phylogeny analysis, into reconciliation and orthology analysis. The MCMC algorithm has been implemented and, although not yet being in its final form, tests show that it performs very well on synthetic as well as biological data. Using standard correspondences, our results carry over to allele trees as well as biogeography.  相似文献   

2.
The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases.  相似文献   

3.
Rediscoveries of species previously thought to be extinct present a dilemma to conservation biology. On one hand, such instances offer the chance to change the course of events away from one that would have led to extinctions. On the other hand, public support for conservation may wane if scientists are frequently seen to overstate and prematurely declare extinctions. Recent studies have adopted a probabilistic approach to infer extinction, using sightings or collections and statistical models to calculate the chance that a species may still be extant. We conduct the first broad-scale test of such models using a recently compiled national red list and national herbarium collection records, including collections of presumed nationally extinct species made after the red list publication, which constitute “rediscoveries”. There was little evidence that the probabilities calculated by these models were associated with rediscoveries over a 3.5-year period. Current probabilistic models of extinction using sighting records could hence be inadequate for use with most natural history collection data.  相似文献   

4.
The logistic or S-shaped curve of growth is one of the few universal laws in biology. It is certain that there exist specific genes affecting growth curves, but, due to a lack of statistical models, it is unclear how these genes cause phenotypic differentiation in growth and developmental trajectories. In this paper we present a statistical model for detecting major genes responsible for growth trajectories. This model is incorporated with pervasive logistic growth curves under the maximum likelihood framework and, thus, is expected to improve over previous models in both parameter estimation and inference. The power of this model is demonstrated by an example using forest tree data, in which evidence of major genes affecting stem growth processes is successfully detected. The implications for this model and its extensions are discussed.  相似文献   

5.
Clustering of genes into groups sharing common characteristics is a useful exploratory technique for a number of subsequent computational analysis. A wide range of clustering algorithms have been proposed in particular to analyze gene expression data, but most of them consider genes as independent entities or include relevant information on gene interactions in a suboptimal way. We propose a probabilistic model that has the advantage to account for individual data (e.g., expression) and pairwise data (e.g., interaction information coming from biological networks) simultaneously. Our model is based on hidden Markov random field models in which parametric probability distributions account for the distribution of individual data. Data on pairs, possibly reflecting distance or similarity measures between genes, are then included through a graph, where the nodes represent the genes, and the edges are weighted according to the available interaction information. As a probabilistic model, this model has many interesting theoretical features. In addition, preliminary experiments on simulated and real data show promising results and points out the gain in using such an approach. Availability: The software used in this work is written in C++ and is available with other supplementary material at http://mistis.inrialpes.fr/people/forbes/transparentia/supplementary.html.  相似文献   

6.
Fossil taxa are critical to inferences of historical diversity and the origins of modern biodiversity, but realizing their evolutionary significance is contingent on restoring fossil species to their correct position within the tree of life. For most fossil species, morphology is the only source of data for phylogenetic inference; this has traditionally been analysed using parsimony, the predominance of which is currently challenged by the development of probabilistic models that achieve greater phylogenetic accuracy. Here, based on simulated and empirical datasets, we explore the relative efficacy of competing phylogenetic methods in terms of clade support. We characterize clade support using bootstrapping for parsimony and Maximum Likelihood, and intrinsic Bayesian posterior probabilities, collapsing branches that exhibit less than 50% support. Ignoring node support, Bayesian inference is the most accurate method in estimating the tree used to simulate the data. After assessing clade support, Bayesian and Maximum Likelihood exhibit comparable levels of accuracy, and parsimony remains the least accurate method. However, Maximum Likelihood is less precise than Bayesian phylogeny estimation, and Bayesian inference recaptures more correct nodes with higher support compared to all other methods, including Maximum Likelihood. We assess the effects of these findings on empirical phylogenies. Our results indicate probabilistic methods should be favoured over parsimony.  相似文献   

7.
Although the trilling chorus frogs (subclade within Pseudacris: Hylidae) have been important in studies of speciation, continental patterns of genetic diversity within and among species have not been elucidated. As a result, this North American clade has been the subject of substantial taxonomic debate. In this study, we examined the phylogenetic relationships among the trilling Pseudacris and tested previously hypothesized scenarios for speciation using 2.4 kb of mitochondrial 12S and 16S rRNA genes from 253 populations. Bayesian phylogenetic analyses, in combination with published morphological and behavioral data, support recognition of at least nine species, including an undescribed species from the south-central United States. Evidence is presented for substantial geographic subdivision within P. brachyphona (northern and southern clades) and P. feriarum (coastal and inland clades). Discordance between morphology/behavior and molecular data in several individuals suggests occasional hybridization between sympatric species. These results require major revision of range limits for several taxa, in particular, P. maculata, P. triseriata, and P. feriarum. Hypothesis tests using parametric bootstrapping strongly reject previously proposed scenarios for speciation in the group. The tests also support recognition of the geographically restricted taxon P. kalmi as a distinct species. Results of this study provide both a firm phylogenetic basis for future studies of speciation in the trilling Pseudacris and a taxonomic framework for conservation efforts.  相似文献   

8.
High‐throughput (next‐generation) DNA sequencing has removed barriers to data quantity and quality, and it has produced phylogenies with high statistical support. Such data are useful to address phylogenetic congruence among individual genes. Concatenated analyses of unlinked genes often produce well‐resolved phylogenetic trees with bootstrap support on major nodes at or approaching 100%, but they have been criticized for providing incorrect phylogenies for various reasons to include a history of hybridization, introgression, and incomplete lineage sorting. The present study compares next‐generation sequencing results of the same accessions of Daucus with different genomic regions, of which three have been reported before: (i) the entire plastid genome, (ii) 47 mitochondrial genes, and (iii) 94 conserved nuclear orthologs. Here, we report a fourth dataset, (iv) 564 895 nuclear SNPs. There are areas of discordance in all four results using the same accessions analyzed with maximum parsimony, maximum likelihood, and with the nuclear data species trees through a coalescent analysis. The nuclear results show significant areas of discordance that were unexpected, because these studies used the same DNA samples, the nuclear studies were generated from large and high‐quality datasets with the SNPs distributed on all nine linkage groups of Daucus carota, and the results were supported by high bootstrap values. These results raise questions concerning the best data and analytical methods to reconstruct and understand the “truth” of a phylogeny.  相似文献   

9.
10.
The explanatory role of natural selection is one of the long-term debates in evolutionary biology. Nevertheless, the consensus has been slippery because conceptual confusions and the absence of a unified, formal causal model that integrates different explanatory scopes of natural selection. In this study we attempt to examine two questions: (i) What can the theory of natural selection explain? and (ii) Is there a causal or explanatory model that integrates all natural selection explananda? For the first question, we argue that five explananda have been assigned to the theory of natural selection and that four of them may be actually considered explananda of natural selection. For the second question, we claim that a probabilistic conception of causality and the statistical relevance concept of explanation are both good models for understanding the explanatory role of natural selection. We review the biological and philosophical disputes about the explanatory role of natural selection and formalize some explananda in probabilistic terms using classical results from population genetics. Most of these explananda have been discussed in philosophical terms but some of them have been mixed up and confused. We analyze and set the limits of these problems.  相似文献   

11.
Background modeling and foreground detection are key parts of any computer vision system. These problems have been addressed in literature with several probabilistic approaches based on mixture models. Here we propose a new kind of probabilistic background models which is based on probabilistic self-organising maps. This way, the background pixels are modeled with more flexibility. On the other hand, a statistical correlation measure is used to test the similarity among nearby pixels, so as to enhance the detection performance by providing a feedback to the process. Several well known benchmark videos have been used to assess the relative performance of our proposal with respect to traditional neural and non neural based methods, with favourable results, both qualitatively and quantitatively. A statistical analysis of the differences among methods demonstrates that our method is significantly better than its competitors. This way, a strong alternative to classical methods is presented.  相似文献   

12.
The avian family Cuculidae (cuckoos) is a diverse group of birds that vary considerably in behaviors of interest to behavioral ecologists, e.g., obligate brood parasitism and cooperative breeding. The taxonomy of this group has historically been relatively stable but has not been extensively evaluated using molecular methods. The goal of this study was to evaluate phylogenetic relationships within the ecologically diverse genus Coua and the placement of Coua among major cuckoo lineages. We sequenced 429 bp of cytochrome b (cyt b) and 522 bp of ND2, both mitochondrial genes, for 26 species of cuckoos spanning 13 genera. We also included the enigmatic hoatzin (Opisthocomus hoazin) and used two Tauraco species as outgroups. ND2 exhibited higher rates of DNA sequence and amino acid substitution than cyt b; however, this did not greatly affect the overall levels of phylogenetic resolution and support provided by these two genes. Combined analyses produced two alternative phylogenies, depending on weighting scheme, both of which were fully resolved and were characterized by high bootstrap support. These phylogenies recovered monophyly for all of the traditional cuckoo subfamilies and indicated, with strong support, that the hoatzin is outside of Cuculidae. Within Coua, an arboreal and a terrestrial clade were identified. In contrast, habitat choice of Coua species did not greatly reflect the phylogeny.  相似文献   

13.
Some new approaches to conservation monitoring of British breeding birds   总被引:1,自引:0,他引:1  
It is important to monitor bird populations both in their own right and as indicators of the general health of wildlife habitats. The objectives of the British Trust for Ornithology's Integrated Population Monitoring programme relate to breeding bird populations in Britain and Ireland and involve the estimation of demographic parameters as well as assessment of numbers. Current programmes for monitoring bird numbers cover the majority of British species; it would be feasible to monitor most of the rest. A new Breeding Bird Survey has been developed to provide effective coverage of all regions and all major habitats in Britain through random sampling, allowing for the marked geographical variation in volunteer observer density. The final choice of a random sample stratified by observer density (with some professional support in regions with few volunteer observers) was based on comparison with alternative stratifications, using data from a 2-year pilot study to assess the number of species adequately covered under various alternatives. A method of assessing whether or not targets are being achieved at any time has been developed: it involves looking back through the data at intervals of 1-year, 4-year, 16-year and longer spans. It will be possible to refine this by incorporating environmental and density-dependent effects into predictive models. The method is illustrated here using Common Birds Census data. We discuss associated problems of statistical inference and of taking decisions under uncertainty. The data provide evidence for large declines in some species, particularly in farmland; the value of birds as general indicators of habitat health is clear. The results of monitoring can be used to illuminate possible causes of problems and to guide both practical steps to ameliorate the problems and research aimed at better understanding the causes. Examples of such research are discussed.  相似文献   

14.
Codon usage in a sample of 28 genes from the pathogenic yeast Candida albicans has been analysed using multivariate statistical analysis. A major trend among genes, correlated with gene expression level, was identified. We have focussed on the extent and nature of divergence between C.albicans and the closely related yeast Saccharomyces cerevisiae. It was recently suggested that significant differences exist between the subsets of preferred codons in these two species [Brown et al. (1991) Nucleic Acids Res. 19, 4293]. Overall, the genes of C.albicans are more A + T-rich, reflecting the lower genomic G + C content of that species, and presumably resulting from a different pattern of mutational bias. However, in both species highly expressed genes preferentially use the same subset of 'optimal' codons. A suggestion that the low frequency of NCG codons in both yeast species results from selection against the presence of codons that are potentially highly mutable is discounted. Codon usage in C.albicans, as in other unicellular species, can be interpreted as the result of a balance between the processes of mutational bias and translational selection. Codon usage in two related Candida species, C.maltosa and C.tropicalis, is briefly discussed.  相似文献   

15.
MOTIVATION: For several decades, free energy minimization methods have been the dominant strategy for single sequence RNA secondary structure prediction. More recently, stochastic context-free grammars (SCFGs) have emerged as an alternative probabilistic methodology for modeling RNA structure. Unlike physics-based methods, which rely on thousands of experimentally-measured thermodynamic parameters, SCFGs use fully-automated statistical learning algorithms to derive model parameters. Despite this advantage, however, probabilistic methods have not replaced free energy minimization methods as the tool of choice for secondary structure prediction, as the accuracies of the best current SCFGs have yet to match those of the best physics-based models. RESULTS: In this paper, we present CONTRAfold, a novel secondary structure prediction method based on conditional log-linear models (CLLMs), a flexible class of probabilistic models which generalize upon SCFGs by using discriminative training and feature-rich scoring. In a series of cross-validation experiments, we show that grammar-based secondary structure prediction methods formulated as CLLMs consistently outperform their SCFG analogs. Furthermore, CONTRAfold, a CLLM incorporating most of the features found in typical thermodynamic models, achieves the highest single sequence prediction accuracies to date, outperforming currently available probabilistic and physics-based techniques. Our result thus closes the gap between probabilistic and thermodynamic models, demonstrating that statistical learning procedures provide an effective alternative to empirical measurement of thermodynamic parameters for RNA secondary structure prediction. AVAILABILITY: Source code for CONTRAfold is available at http://contra.stanford.edu/contrafold/.  相似文献   

16.
Abstract. The use of Generalized Linear Models (GLM) in vegetation analysis has been advocated to accommodate complex species response curves. This paper investigates the potential advantages of using classification and regression trees (CART), a recursive partitioning method that is free of distributional assumptions. We used multiple logistic regression (a form of GLM) and CART to predict the distribution of three major oak species in California. We compared two types of model: polynomial logistic regression models optimized to account for non‐linearity and factor interactions, and simple CART‐models. Each type of model was developed using learning data sets of 2085 and 410 sample cases, and assessed on test sets containing 2016 and 3691 cases respectively. The responses of the three species to environmental gradients were varied and often non‐homogeneous or context dependent. We tested the methods for predictive accuracy: CART‐models performed significantly better than our polynomial logistic regression models in four of the six cases considered, and as well in the two remaining cases. CART also showed a superior ability to detect factor interactions. Insight gained from CART‐models then helped develop improved parametric models. Although the probabilistic form of logistic regression results is more adapted to test theories about species responses to environmental gradients, we found that CART‐models are intuitive, easy to develop and interpret, and constitute a valuable tool for modeling species distributions.  相似文献   

17.
Aim The study and prediction of species–environment relationships is currently mainly based on species distribution models. These purely correlative models neglect spatial population dynamics and assume that species distributions are in equilibrium with their environment. This causes biased estimates of species niches and handicaps forecasts of range dynamics under environmental change. Here we aim to develop an approach that statistically estimates process‐based models of range dynamics from data on species distributions and permits a more comprehensive quantification of forecast uncertainties. Innovation We present an approach for the statistical estimation of process‐based dynamic range models (DRMs) that integrate Hutchinson's niche concept with spatial population dynamics. In a hierarchical Bayesian framework the environmental response of demographic rates, local population dynamics and dispersal are estimated conditional upon each other while accounting for various sources of uncertainty. The method thus: (1) jointly infers species niches and spatiotemporal population dynamics from occurrence and abundance data, and (2) provides fully probabilistic forecasts of future range dynamics under environmental change. In a simulation study, we investigate the performance of DRMs for a variety of scenarios that differ in both ecological dynamics and the data used for model estimation. Main conclusions Our results demonstrate the importance of considering dynamic aspects in the collection and analysis of biodiversity data. In combination with informative data, the presented framework has the potential to markedly improve the quantification of ecological niches, the process‐based understanding of range dynamics and the forecasting of species responses to environmental change. It thereby strengthens links between biogeography, population biology and theoretical and applied ecology.  相似文献   

18.
Chloroplast phylogeny indicates that bryophytes are monophyletic   总被引:3,自引:0,他引:3  
Opinions on the basal relationship of land plants vary considerably and no phylogenetic tree with significant statistical support has been obtained. Here, we report phylogenetic analyses using 51 genes from the entire chloroplast genome sequences of 20 representative green plant species. The analyses, using translated amino acid sequences, indicated that extant bryophytes (mosses, liverworts, and hornworts) form a monophyletic group with high statistical confidence and that extant bryophytes are likely sisters to extant vascular plants, although the support for monophyletic vascular plants was not strong. Analyses at the nucleotide level could not resolve the basal relationship with statistical confidence. Bryophyte monophyly inferred using amino acid sequences has a good statistical foundation and is not rejected statistically by other data sets. We propose bryophyte monophyly as the currently best hypothesis.  相似文献   

19.
The analysis of large datasets describing reproductive isolation between species has been extremely influential in the study of speciation. However, the statistical methods currently used for these data limit the ability to make direct inferences about the factors predicting the evolution of reproductive isolation. As a result, our understanding of iconic patterns and rules of speciation rely on indirect analyses that have clear statistical limitations. Phylogenetic mixed models are commonly used in ecology and evolution, but have not been applied to studies of reproductive isolation. Here I describe a flexible framework using phylogenetic mixed models to analyze data collected at different evolutionary scales, to test both categorical and continuous predictor variables, and to test the effect of multiple predictors on rates and patterns of reproductive isolation simultaneously. I demonstrate the utility of this framework by re‐analyzing four classic datasets, from both animals and plants, and evaluating several hypotheses that could not be tested in the original studies: In the Drosophila and Bufonidae datasets, I found support for more rapid accumulation of reproductive isolation in sympatric species pairs compared to allopatric species pairs. Using Silene and Nolana, I found no evidence supporting the hypothesis that floral differentiation elevates postzygotic reproductive isolation. The faster accumulation of postzygotic isolation in sympatry is likely the result of species coexistence determined by the level of postzygotic isolation between species. In addition, floral trait divergence does not appear to translate into pleiotropic effects on postzygotic reproductive isolation. Overall, these methods can allow researchers to test new hypotheses using a single statistical method, while remedying the statistical limitations of several previous methods.  相似文献   

20.
Estimates of threat form an intrinsic element of World Conservation Union (IUCN) Red List criteria, and in the assignment of species to defined threat categories. However, assignment under the IUCN criteria is demanding in terms of the amount of information that is required. For many species adequate data are lacking. Further, many of the terms and parameters used under IUCN criteria are subjective and open to varying interpretations. During the last decade a number of probabilistic statistical models have been developed which use historical sighting data, such as herbarium and museum collections, to generate objective, quantitative inference of threat and extinction without the requirement for extensive formal survey procedures and where little or no other data exists. In this study these statistical models were applied to herbarium data for the genus Guzmania (Bromeliaceae) from Ecuador. The results suggest that, for species for which collection records are adequate, these methods can be of use in strengthening IUCN Red List assessment procedure. Further, these methods present a unique means of prioritising threat when few biological data are available.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号