首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background  

Identifying coevolving positions in protein sequences has myriad applications, ranging from understanding and predicting the structure of single molecules to generating proteome-wide predictions of interactions. Algorithms for detecting coevolving positions can be classified into two categories: tree-aware, which incorporate knowledge of phylogeny, and tree-ignorant, which do not. Tree-ignorant methods are frequently orders of magnitude faster, but are widely held to be insufficiently accurate because of a confounding of shared ancestry with coevolution. We conjectured that by using a null distribution that appropriately controls for the shared-ancestry signal, tree-ignorant methods would exhibit equivalent statistical power to tree-aware methods. Using a novel t-test transformation of coevolution metrics, we systematically compared four tree-aware and five tree-ignorant coevolution algorithms, applying them to myoglobin and myosin. We further considered the influence of sequence recoding using reduced-state amino acid alphabets, a common tactic employed in coevolutionary analyses to improve both statistical and computational performance.  相似文献   

2.
How do we quantify patterns (such as responses to local selection) sampled across multiple populations within a single species? Key to this question is the extent to which populations within species represent statistically independent data points in our analysis. Comparative analyses across species and higher taxa have long recognized the need to control for the non-independence of species data that arises through patterns of shared common ancestry among them (phylogenetic non-independence), as have quantitative genetic studies of individuals linked by a pedigree. Analyses across populations lacking pedigree information fall in the middle, and not only have to deal with shared common ancestry, but also the impact of exchange of migrants between populations (gene flow). As a result, phenotypes measured in one population are influenced by processes acting on others, and may not be a good guide to either the strength or direction of local selection. Although many studies examine patterns across populations within species, few consider such non-independence. Here, we discuss the sources of non-independence in comparative analysis, and show why the phylogeny-based approaches widely used in cross-species analyses are unlikely to be useful in analyses across populations within species. We outline the approaches (intraspecific contrasts, generalized least squares, generalized linear mixed models and autoregression) that have been used in this context, and explain their specific assumptions. We highlight the power of ‘mixed models’ in many contexts where problems of non-independence arise, and show that these allow incorporation of both shared common ancestry and gene flow. We suggest what can be done when ideal solutions are inaccessible, highlight the need for incorporation of a wider range of population models in intraspecific comparative methods and call for simulation studies of the error rates associated with alternative approaches.  相似文献   

3.
寄生虫及其宿主协同进化的研究进展   总被引:7,自引:0,他引:7  
刘汉生  陈智兵  胡朝晖  林小涛 《生态科学》2003,22(3):261-264,208
本文对寄生虫及其宿主协同进化的研究进行了回顾,将其发展分为三个阶段:1.寄生虫-宿主协同进化的初步认识;2.协同进化模式及其内在机制的探索;3.协同进化机制研究方法的发展。目前的研究主要集中于协同进化生物学意义的进一步深入探讨。同时,对协同进化的有关概念、方法和本学科的发展进行了简单阐述和讨论。  相似文献   

4.
5.
Mating traits and mate preferences often show patterns of tight correspondence across populations and species. These patterns of apparent coevolution may result from a genetic association between traits and preferences (i.e. trait–preference genetic covariance). We review the literature on trait–preference covariance to determine its prevalence and potential biological relevance. Of the 43 studies we identified, a surprising 63% detected covariance. We test multiple hypotheses for factors that may influence the likelihood of detecting this covariance. The main predictor was the presence of genetic variation in mate preferences, which is one of the three main conditions required for the establishment of covariance. In fact, 89% of the nine studies where heritability of preference was high detected covariance. Variables pertaining to the experimental methods and type of traits involved in different studies did not greatly influence the detection of trait–preference covariance. Trait–preference genetic covariance appears to be widespread and therefore represents an important and currently underappreciated factor in the coevolution of traits and preferences.  相似文献   

6.
Variation within major histocompatibility complex (MHC) genes is important in recognizing pathogens and initiating an immune response. These genes are relevant in enhancing our understanding of how species cope with rapid environmental changes and concomitant fluctuations in selective pressures such as invasive, infectious diseases. Disease-based models suggest that diversity at MHC is maintained through balancing selection arising from the coevolution of hosts and pathogens. Despite intensive balancing selection, sequence motifs or even identical MHC alleles can be shared across multiple species; three potential mechanisms have been put forth to explain this phenomenon: common ancestry, convergent evolution, and random chance. To understand the processes that maintain MHC similarity across divergent species, we examined the variation at two orthologous MHC-DRB genes in widespread North American Musteloid species, striped skunks (Mephitis mephitis), and raccoons (Procyon lotor). These species are often sympatric and exposed to a similar suite of diseases (e.g., rabies, canine distemper, and parvovirus). Given their exposure to similar selective pressures from pathogens, we postulated that similar DRB alleles may be present in both species. Our results indicated that similar motifs are present within both species, at functionally relevant polymorphic sites. However, based on phylogenetic analyses that included previously published MHC sequences of several closely related carnivores, the respective MHC-DRB alleles do not appear to have been maintained through common ancestry and unlikely through random chance. Instead, the similarities observed between the two mesocarnivore species may rather be due to evolutionary convergence.  相似文献   

7.
Geography and landscape are important determinants of genetic variation in natural populations, and several ancestry estimation methods have been proposed to investigate population structure using genetic and geographic data simultaneously. Those approaches are often based on computer‐intensive stochastic simulations and do not scale with the dimensions of the data sets generated by high‐throughput sequencing technologies. There is a growing demand for faster algorithms able to analyse genomewide patterns of population genetic variation in their geographic context. In this study, we present TESS3 , a major update of the spatial ancestry estimation program TESS . By combining matrix factorization and spatial statistical methods, TESS3 provides estimates of ancestry coefficients with accuracy comparable to TESS and with run‐times much faster than the Bayesian version. In addition, the TESS3 program can be used to perform genome scans for selection, and separate adaptive from nonadaptive genetic variation using ancestral allele frequency differentiation tests. The main features of TESS3 are illustrated using simulated data and analysing genomic data from European lines of the plant species Arabidopsis thaliana.  相似文献   

8.

Background  

Discovering approximately repeated patterns, or motifs, in biological sequences is an important and widely-studied problem in computational molecular biology. Most frequently, motif finding applications arise when identifying shared regulatory signals within DNA sequences or shared functional and structural elements within protein sequences. Due to the diversity of contexts in which motif finding is applied, several variations of the problem are commonly studied.  相似文献   

9.
A probabilistic graphical model is proposed in order to detect the coevolution between different sites in biological sequences. The model extends the continuous-time Markov process of sequence substitution for single nucleic or amino acids and imposes general constraints regarding simultaneous changes on the substitution rate matrix. Given a multiple sequence alignment for each molecule of interest and a phylogenetic tree, the model can predict potential interactions within or between nucleic acids and proteins. Initial validation of the model is carried out using tRNA and 16S rRNA sequence data. The model accurately identifies the secondary interactions of tRNA as well as several known tertiary interactions. In addition, results on 16S rRNA data indicate this general and simple coevolutionary model outperforms several other parametric and nonparametric methods in predicting secondary interactions. Furthermore, the majority of the putative predictions exhibit either direct contact or proximity of the nucleotide pairs in the 3-dimensional structure of the Thermus thermophilus ribosomal small subunit. The results on RNA data suggest a general model of coevolution might be applied to other types of interactions between protein, DNA, and RNA molecules.  相似文献   

10.

Background  

The BLAST algorithm compares biological sequences to one another in order to determine shared motifs and common ancestry. However, the comparison of all non-redundant (NR) sequences against all other NR sequences is a computationally intensive task. We developed NBLAST as a cluster computer implementation of the BLAST family of sequence comparison programs for the purpose of generating pre-computed BLAST alignments and neighbour lists of NR sequences.  相似文献   

11.
Mismatch string kernels for discriminative protein classification   总被引:1,自引:0,他引:1  
MOTIVATION: Classification of proteins sequences into functional and structural families based on sequence homology is a central problem in computational biology. Discriminative supervised machine learning approaches provide good performance, but simplicity and computational efficiency of training and prediction are also important concerns. RESULTS: We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the problem of protein classification and remote homology detection. These kernels measure sequence similarity based on shared occurrences of fixed-length patterns in the data, allowing for mutations between patterns. Thus, the kernels provide a biologically well-motivated way to compare protein sequences without relying on family-based generative models such as hidden Markov models. We compute the kernels efficiently using a mismatch tree data structure, allowing us to calculate the contributions of all patterns occurring in the data in one pass while traversing the tree. When used with an SVM, the kernels enable fast prediction on test sequences. We report experiments on two benchmark SCOP datasets, where we show that the mismatch kernel used with an SVM classifier performs competitively with state-of-the-art methods for homology detection, particularly when very few training examples are available. Examination of the highest-weighted patterns learned by the SVM classifier recovers biologically important motifs in protein families and superfamilies.  相似文献   

12.
13.
Homoplasy and homology: dichotomy or continuum?   总被引:4,自引:0,他引:4  
Homology is the presence of the same feature in two organisms whose most recent common ancestor also possessed the feature. I discuss the bases on which we can tell that two features being compared share sufficient elements of sameness to allow them to be treated as homologous and therefore to be legitimately compared with one another in a way that informs comparative, evolutionary, and phylogenetic analysis. To do so, I discuss the relationship(s) between homology and homoplasy to conclude that we are dealing neither with a dichotomy between homoplasy as parallelism/convergence and homology as common descent nor with a dichotomy of homoplasy as the interrupted presence of the character in a lineage and homology as the continuous presence of the character. Rather, we are dealing with common descent with varying degrees of modification. Homoplasy and homology are not dichotomies but the extremes of a continuum, reflecting deep or more recent shared ancestry based on shared cellular mechanisms and processes and shared genes and gene pathways and networks. The same genes can be used to initiate the development of homoplastic and homologous structures. Consequently, structures may be lost but their developmental bases retained, providing the potential for homoplasy. It should not be surprising that similar features persist when a feature is present in the nearest common ancestor (homology). Neither should it be surprising to find that different environments or selective pressures can trigger the reappearance of similar features in organisms that do not share a recent common ancestor (homoplasy).  相似文献   

14.
Unexpected divergence and molecular coevolution in yeast plasmids   总被引:2,自引:0,他引:2  
Four closely related species of yeast possess multicopy nuclear plasmids whose shared molecular architecture demonstrates a common ancestor, despite their lack of discernible DNA sequence homology. Each plasmid encodes three proteins which have equivalent essential functions in plasmid maintenance. These three groups of proteins show markedly different degrees of conservation, so that although we have successfully aligned sequences for two groups, members of the third group have diverged to such an extent that they cannot be aligned. All the proteins are sufficiently different that they function only in conjunction with their encoding plasmid. These proteins have therefore conserved their functional interactions with the relevant DNA sequences of their particular plasmids, despite lack of amino acid sequence conservation. The maintenance of function in the face of DNA sequence divergence is analogous to the coevolution of ribosomal DNA promoters and RNA polymerase I, and suggests that molecular drive may be an important force in the evolution of these plasmids. This view is reinforced by the inconsistent phylogenetic relationships determined from the two alignment sets, and by the contradiction that the two plasmids known to be the closest related taxonomically and by their host interchangeability are suggested to be the most distant by their sequences.  相似文献   

15.
The number of distinct functional classes of single-stranded RNAs (ssRNAs) and the number of sequences representing them are substantial and continue to increase. Organizing this data in an evolutionary context is essential, yet traditional comparative sequence analyses require that homologous sites can be identified. This prevents comparative analysis between sequences of different functional classes that share no site-to-site sequence similarity. Analysis within a single evolutionary lineage also limits evolutionary inference because shared ancestry confounds properties of molecular structure and function that are historically contingent with those that are imposed for biophysical reasons. Here, we apply a method of comparative analysis to ssRNAs that is not restricted to homologous sequences, and therefore enables comparison between distantly related or unrelated sequences, minimizing the effects of shared ancestry. This method is based on statistical similarities in nucleotide base composition among different functional classes of ssRNAs. In order to denote base composition unambiguously, we have calculated the fraction G+A and G+U content, in addition to the more commonly used fraction G+C content. These three parameters define RNA composition space, which we have visualized using interactive graphics software. We have examined the distribution of nucleotide composition from 15 distinct functional classes of ssRNAs from organisms spanning the universal phylogenetic tree and artificial ribozymes evolved in vitro. Surprisingly, these distributions are biased consistently in G+A and G+U content, both within and between functional classes, regardless of the more variable G+C content. Additionally, an analysis of the base composition of secondary structural elements indicates that paired and unpaired nucleotides, known to have different evolutionary rates, also have significantly different compositional biases. These universal compositional biases observed among ssRNAs sharing little or no sequence similarity suggest, contrary to current understanding, that base composition biases constitute a convergent adaptation among a wide variety of molecular functions.  相似文献   

16.
It has become clear that hybridization between species is much more common than previously recognized. As a result, we now know that the genomes of many modern species, including our own, are a patchwork of regions derived from past hybridization events. Increasingly researchers are interested in disentangling which regions of the genome originated from each parental species using local ancestry inference methods. Due to the diverse effects of admixture, this interest is shared across disparate fields, from human genetics to research in ecology and evolutionary biology. However, local ancestry inference methods are sensitive to a range of biological and technical parameters which can impact accuracy. Here we present paired simulation and ancestry inference pipelines, mixnmatch and ancestryinfer, to help researchers plan and execute local ancestry inference studies. mixnmatch can simulate arbitrarily complex demographic histories in the parental and hybrid populations, selection on hybrids, and technical variables such as coverage and contamination. ancestryinfer takes as input sequencing reads from simulated or real individuals, and implements an efficient local ancestry inference pipeline. We perform a series of simulations with mixnmatch to pinpoint factors that influence accuracy in local ancestry inference and highlight useful features of the two pipelines. mixnmatch is a powerful tool for simulations of hybridization while ancestryinfer facilitates local ancestry inference on real or simulated data.  相似文献   

17.
Synopsis Research in all fields of biology increasingly uses phylogenetic systematics to interpret biological data in an evolutionary context. It is becoming widely accepted that comparative studies of the correlation of biological features, such as ecomorphological studies, must frame their analyses within the context of a phylogenetic hierarchy rather than treating each taxonomic unit as an independent replicate. Recent methods for the interpretation of ecological and functional data in the framework of a phylogeny can reveal the degree to which ecomorphological characters are correlated with one another, and are congruent with hierarchical cladistic groups. An example of the ecomorphology of labrid fishes is used here to illustrate the application of several of these methods. The structural design and mechanics of the jaws of labrids are tested for ecomorphological associations with the natural diets of these fishes. Methods for analysis of the correlated evolution of both discrete and continuous quantitative characters within a phylogeny are practiced on a single ecomorphological data set. Techniques used include character coding, character mapping, phylogenetic autocorrelation, independent contrasts, and squared change parsimony. These approaches to diverse biological data allow the study of ecomorphology to account for patterns of phylogenetic ancestry. Biomechanics or functional morphology also plays a vital role in the determination of ecomorphological relationships by clarifying the mechanisms by which morphologies can perform behaviors important to the organism's ecology. The synthesis of systematics with biomechanics is an example of interdisciplinary study in which information exchange can elucidate patterns of evolution in ecomorphology.  相似文献   

18.
Residue coevolution has recently emerged as an important concept, especially in the context of protein structures. While a multitude of different functions for quantifying it have been proposed, not much is known about their relative strengths and weaknesses. Also, subtle algorithmic details have discouraged implementing and comparing them. We addressed this issue by developing an integrated online system that enables comparative analyses with a comprehensive set of commonly used scoring functions, including Statistical Coupling Analysis (SCA), Explicit Likelihood of Subset Variation (ELSC), mutual information and correlation-based methods. A set of data preprocessing options are provided for improving the sensitivity and specificity of coevolution signal detection, including sequence weighting, residue grouping and the filtering of sequences, sites and site pairs. A total of more than 100 scoring variations are available. The system also provides facilities for studying the relationship between coevolution scores and inter-residue distances from a crystal structure if provided, which may help in understanding protein structures. AVAILABILITY: The system is available at http://coevolution.gersteinlab.org. The source code and JavaDoc API can also be downloaded from the web site.  相似文献   

19.
Population stratification may confound the results of genetic association studies among unrelated individuals from admixed populations. Several methods have been proposed to estimate the ancestral information in admixed populations and used to adjust the population stratification in genetic association tests. We evaluate the performances of three different methods: maximum likelihood estimation, ADMIXMAP and Structure through various simulated data sets and real data from Latino subjects participating in a genetic study of asthma. All three methods provide similar information on the accuracy of ancestral estimates and control type I error rate at an approximately similar rate. The most important factor in determining accuracy of the ancestry estimate and in minimizing type I error rate is the number of markers used to estimate ancestry. We demonstrate that approximately 100 ancestry informative markers (AIMs) are required to obtain estimates of ancestry that correlate with correlation coefficients more than 0.9 with the true individual ancestral proportions. In addition, after accounting for the ancestry information in association tests, the excess of type I error rate is controlled at the 5% level when 100 markers are used to estimate ancestry. However, since the effect of admixture on the type I error rate worsens with sample size, the accuracy of ancestry estimates also needs to increase to make the appropriate correction. Using data from the Latino subjects, we also apply these methods to an association study between body mass index and 44 AIMs. These simulations are meant to provide some practical guidelines for investigators conducting association studies in admixed populations.  相似文献   

20.
Detecting protein-protein interactions and assigning proteins to functional complexes are key challenges of modern biology. The rise of genomics has lead to evidence that correlated patterns of presence/absence and/or fusing of proteins in any organism suggest these proteins interact. Unfortunately, methods based on such data work best with divergent genomes, whereas major sequencing efforts in vertebrates, for example, are yielding alignments of the same set of proteins sampled from the same set of taxa (species). Using vertebrate mitochondrial genomes to illustrate a novel method, we associate proteins based on vectors of their evolutionary tree edge (branch or internode) lengths. This approach is based on the expectation that molecular coevolution is greatest between proteins that interact in some way. Mitochondrial DNA-encoded proteins are associated into groups largely consistent with the complexes they come from. This association is apparently not due to the tree structure or mutation processes, leaving coevolution as the best explanation. We show that it is important that the tree used to derive the edge-length vector is estimated accurately in terms of both topology and edge lengths. Although more complex substitution models reduce systematic error, they also inflate stochastic error. This makes the use of less complex substitution models preferable in some circumstances. We describe a method to estimate correlations of pairwise evolutionary distances, which adjusts for non-independent correlations due to shared evolutionary history. Associations of proteins based on their edge-length vectors are visualized and assessed using a variety of hierarchical clustering and multidimensional scaling methods. New formula for estimating the fit of data to model, including the average percent standard deviation of distances on least squares trees, are presented. Use of edge-length vectors is compared and contrasted with correlated distance methods, correlated rates methods, and site-specific evidence of coevolution.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号