首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The covarion (or site specific rate variation, SSRV) process of biological sequence evolution is a process by which the evolutionary rate of a nucleotide/amino acid/codon position can change in time. In this paper, we introduce time-continuous, space-discrete, Markov-modulated Markov chains as a model for representing SSRV processes, generalizing existing theory to any model of rate change. We propose a fast algorithm for diagonalizing the generator matrix of relevant Markov-modulated Markov processes. This algorithm makes phylogeny likelihood calculation tractable even for a large number of rate classes and a large number of states, so that SSRV models become applicable to amino acid or codon sequence datasets. Using this algorithm, we investigate the accuracy of the discrete approximation to the Gamma distribution of evolutionary rates, widely used in molecular phylogeny. We show that a relatively large number of classes is required to achieve accurate approximation of the exact likelihood when the number of analyzed sequences exceeds 20, both under the SSRV and among site rate variation (ASRV) models.  相似文献   

2.
Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a major open question how to define an evolutionary distance metric between divergent sequences that makes use of indel information and known substitution models without the need for a multiple alignment. Here we propose a new evolutionary distance metric to close this gap. It uses finite-state transducers to create a biologically motivated similarity score which models substitutions and indels, and does not depend on a multiple sequence alignment. The sequence similarity score is defined in analogy to pairwise alignments and additionally has the positive semi-definite property. We describe its derivation and show in simulation studies and real-world examples that it is more accurate in reconstructing phylogenies than competing methods. The result is a new and accurate way of determining evolutionary distances in and beyond the twilight zone of sequence alignments that is suitable for large datasets.  相似文献   

3.
The general Markov model (GMM) of nucleotide substitution does not assume the evolutionary process to be stationary, reversible, or homogeneous. The GMM can be simplified by assuming the evolutionary process to be stationary. A stationary GMM is appropriate for analyses of phylogenetic data sets that are compositionally homogeneous; a data set is considered to be compositionally homogeneous if a statistical test does not detect significant differences in the marginal distributions of the sequences. Though the general time-reversible (GTR) model assumes stationarity, it also assumes reversibility and homogeneity. We propose two new stationary and nonhomogeneous models--one constrains the GMM to be reversible, whereas the other does not. The two models, coupled with the GTR model, comprise a set of nested models that can be used to test the assumptions of reversibility and homogeneity for stationary processes. The two models are extended to incorporate invariable sites and used to analyze a seven-taxon hominoid data set that displays compositional homogeneity. We show that within the class of stationary models, a nonhomogeneous model fits the hominoid data better than the GTR model. We note that if one considers a wider set of models that are not constrained to be stationary, then an even better fit can be obtained for the hominoid data. However, the methods for reducing model complexity from an extremely large set of nonstationary models are yet to be developed.  相似文献   

4.
With an ever-increasing amount of available data on protein-protein interaction (PPI) networks and research revealing that these networks evolve at a modular level, discovery of conserved patterns in these networks becomes an important problem. Although available data on protein-protein interactions is currently limited, recently developed algorithms have been shown to convey novel biological insights through employment of elegant mathematical models. The main challenge in aligning PPI networks is to define a graph theoretical measure of similarity between graph structures that captures underlying biological phenomena accurately. In this respect, modeling of conservation and divergence of interactions, as well as the interpretation of resulting alignments, are important design parameters. In this paper, we develop a framework for comprehensive alignment of PPI networks, which is inspired by duplication/divergence models that focus on understanding the evolution of protein interactions. We propose a mathematical model that extends the concepts of match, mismatch, and gap in sequence alignment to that of match, mismatch, and duplication in network alignment and evaluates similarity between graph structures through a scoring function that accounts for evolutionary events. By relying on evolutionary models, the proposed framework facilitates interpretation of resulting alignments in terms of not only conservation but also divergence of modularity in PPI networks. Furthermore, as in the case of sequence alignment, our model allows flexibility in adjusting parameters to quantify underlying evolutionary relationships. Based on the proposed model, we formulate PPI network alignment as an optimization problem and present fast algorithms to solve this problem. Detailed experimental results from an implementation of the proposed framework show that our algorithm is able to discover conserved interaction patterns very effectively, in terms of both accuracies and computational cost.  相似文献   

5.
6.
The choice of an "optimal" mathematical model for computing evolutionary distances from real sequences is not currently supported by easy-to-use software applicable to large data sets, and an investigator frequently selects one of the simplest models available. Here we study properties of the observed proportion of differences (p- distance) between sequences as an estimator of evolutionary distance for tree-making. We show that p-distances allow for consistent tree- making with any of the popular methods working with evolutionary distances if evolution of sequences obeys a "molecular clock" (more precisely, if it follows a stationary time-reversible Markov model of nucleotide substitution). Next, we show that p-distances seem to be efficient in recovering the correct tree topology under a "molecular clock," but produce "statistically supported" wrong trees when substitutions rates vary among evolutionary lineages. Finally, we outline a practical approach for selecting an "optimal" model of nucleotide substitution in a real data analysis, and obtain a crude estimate of a "prior" distribution of the expected tree branch lengths under the Jukes-Cantor model. We conclude that the use of a model that is obviously oversimplified is inadvisable unless it is justified by a preliminary analysis of the real sequences.   相似文献   

7.
We develop a new approach to estimate a matrix of pairwise evolutionary distances from a codon-based alignment based on a codon evolutionary model. The method first computes a standard distance matrix for each of the three codon positions. Then these three distance matrices are weighted according to an estimate of the global evolutionary rate of each codon position and averaged into a unique distance matrix. Using a large set of both real and simulated codon-based alignments of nucleotide sequences, we show that this approach leads to distance matrices that have a significantly better treelikeness compared to those obtained by standard nucleotide evolutionary distances. We also propose an alternative weighting to eliminate the part of the noise often associated with some codon positions, particularly the third position, which is known to induce a fast evolutionary rate. Simulation results show that fast distance-based tree reconstruction algorithms on distance matrices based on this codon position weighting can lead to phylogenetic trees that are at least as accurate as, if not better, than those inferred by maximum likelihood. Finally, a well-known multigene dataset composed of eight yeast species and 106 codon-based alignments is reanalyzed and shows that our codon evolutionary distances allow building a phylogenetic tree which is similar to those obtained by non-distance-based methods (e.g., maximum parsimony and maximum likelihood) and also significantly improved compared to standard nucleotide evolutionary distance estimates.  相似文献   

8.
Most research on the biological effects of Pleistocene glaciation and refugia has been undertaken in the northern hemisphere and focuses on lowland taxa. Using single-strand conformation polymorphism (SSCP) analysis and sequencing of mitochondrial cytochrome oxidase I, we explored the intraspecific phylogeography of a flightless orthopteran (the alpine scree weta, Deinacrida connectens) that is adapted to the alpine zone of South Island, New Zealand. We found that several mountain ranges and regions had their own reciprocally monophyletic, deeply differentiated lineages. Corrected genetic distance among lineages was 8.4% (Kimura 2-parameter [K2P]) / 13% (GTR + I + Gamma), whereas within-lineage distances were only 2.8% (K2P) / 3.2% (GTR + I + Gamma). We propose a model to explain this phylogeographical structure, which links the radiation of D. connectens to Pliocene mountain building, and maintenance of this structure through the combined effects of mountain-top isolation during Pleistocene interglacials and ice barriers to dispersal during glacials.  相似文献   

9.
《Biophysical journal》2022,121(16):3023-3033
Collagen fibrils are the major constituents of the extracellular matrix, which provides structural support to vertebrate connective tissues. It is widely assumed that the superstructure of collagen fibrils is encoded in the primary sequences of the molecular building blocks. However, the interplay between large-scale architecture and small-scale molecular interactions makes the ab initio prediction of collagen structure challenging. Here, we propose a model that allows us to predict the periodic structure of collagen fibers and the axial offset between the molecules, purely on the basis of simple predictive rules for the interaction between amino acid residues. With our model, we identify the sequence-dependent collagen fiber geometries with the lowest free energy and validate the predicted geometries against the available experimental data. We propose a procedure for searching for optimal staggering distances. Finally, we build a classification algorithm and use it to scan 11 data sets of vertebrate fibrillar collagens, and predict the periodicity of the resulting assemblies. We analyzed the experimentally observed variance of the optimal stagger distances across species, and find that these distances, and the resulting fibrillar phenotypes, are evolutionary well preserved. Moreover, we observed that the energy minimum at the optimal stagger distance is broad in all cases, suggesting a further evolutionary adaptation designed to improve the assembly kinetics. Our periodicity predictions are not only in good agreement with the experimental data on collagen molecular staggering for all collagen types analyzed, but also for synthetic peptides. We argue that, with our model, it becomes possible to design tailor-made, periodic collagen structures, thereby enabling the design of novel biomimetic materials based on collagen-mimetic trimers.  相似文献   

10.
One direction in exploring similarities among biological sequences (such as DNA, RNA, and proteins), is to associate with such systems ordered sets of sequence invariants. These invariants represent selected properties of mathematical objects, such as matrices, that one can associate with biological sequences. In this article, we are exploring properties of recently introduced Line Distance matrices, and in particular we consider properties of their eigenvalues. We prove that Line Distance matrices of size n have one positive and n - 1 negative eigenvalues. Visual representation of Cauchy's interlacing property for Line Distance matrices is considered. Matlab programs for line distance matrices and examples are available on the following website: www.fmf.uni-lj.si/ approximately jaklicg/ldmatrix.html.  相似文献   

11.
The development of accurate computational models of biological processes is fundamental to computational systems biology. These models are usually represented by mathematical expressions that rely heavily on the system parameters. The measurement of these parameters is often difficult. Therefore, they are commonly estimated by fitting the predicted model to the experimental data using optimization methods. The complexity and nonlinearity of the biological processes pose a significant challenge, however, to the development of accurate and fast optimization methods. We introduce a new hybrid optimization method incorporating the Firefly Algorithm and the evolutionary operation of the Differential Evolution method. The proposed method improves solutions by neighbourhood search using evolutionary procedures. Testing our method on models for the arginine catabolism and the negative feedback loop of the p53 signalling pathway, we found that it estimated the parameters with high accuracy and within a reasonable computation time compared to well-known approaches, including Particle Swarm Optimization, Nelder-Mead, and Firefly Algorithm. We have also verified the reliability of the parameters estimated by the method using an a posteriori practical identifiability test.  相似文献   

12.
The term complexity means several things to biologists. When qualifying morphological phenotype, on the one hand, it is used to signify the sheer complicatedness of living systems, especially as a result of the multicomponent aspect of biological form. On the other hand, it has been used to represent the intricate nature of the connections between constituents that make up form: a more process-based explanation. In the context of evolutionary arguments, complexity has been defined, in a quantifiable fashion, as the amount of information, an informatic template such as a sequence of nucleotides or amino acids stores about its environment. In this perspective, we begin with a brief review of the history of complexity theory. We then introduce a developmental and an evolutionary understanding of what it means for biological systems to be complex. We propose that the complexity of living systems can be understood through two interdependent structural properties: multiscalarity of interconstituent mechanisms and excitability of the biological materials. The answer to whether a system becomes more or less complex over time depends on the potential for its constituents to interact in novel ways and combinations to give rise to new structures and functions, as well as on the evolution of excitable properties that would facilitate the exploration of interconstituent organization in the context of their microenvironments and macroenvironments.  相似文献   

13.
Covarion processes allow changes in evolutionary rates at sites along the branches of a phylogenetic tree. Covarion-like evolution is increasingly recognized as an important mode of protein evolution. Several recent reports suggest that maximum likelihood estimation employing covarion models may support different optimal topologies than estimation using standard rates-across-sites (RAS) models. However, it remains to be demonstrated that ignoring covarion evolution will generally result in topological misestimation. In this study we performed analytical and theoretical studies of limiting distances under the covarion model and four-taxon tree simulations to investigate the extent to which the covarion process impacts on phylogenetic estimation. In particular, we assessed the limits of an RAS model-based maximum likelihood method to recover the phylogenies when the sequence data were simulated under the covarion processes. We find that, when ignored, covarion processes can induce systematic errors in phylogeny reconstruction. Surprisingly, when sequences are evolved under a covarion process but an RAS model is used for estimation, we find that a long branch repel bias occurs.  相似文献   

14.
15.
A key issue troubling bacterial taxonomy and systematics is the lack of a biological species definition. Criteria to be used for defining bacterial species on genetic and biological bases should be able to reveal clear-cut boundaries among clusters of bacteria. To date, DNA–DNA re-association assays and ribosomal RNA sequence comparison have been useful in determining relative evolutionary distances among bacteria but the data are continuous and thus cannot define bacterial clusters as taxonomic units to be called species. Using Salmonella as models, we have looked for definite genetic and biologic uniqueness of clusters of bacteria. Based on our findings that each Salmonella lineage has a unique genome structure shared by strains of the same lineage but not overlapping with strains of other Salmonella lineages, we conclude that this is a result of genetic isolation following divergence of the bacteria. We propose that there should be genetic boundaries between different species of bacteria at the genomic level, which awaits further genomic information for validation.  相似文献   

16.
In this paper, we propose a communication model of evolution and investigate its information-theoretic bounds. The process of evolution is modeled as the retransmission of information over a protein communication channel, where the transmitted message is the organism's proteome encoded in the DNA. We compute the capacity and the rate distortion functions of the protein communication system for the three domains of life: Archaea, Bacteria, and Eukaryotes. The tradeoff between the transmission rate and the distortion in noisy protein communication channels is analyzed. As expected, comparison between the optimal transmission rate and the channel capacity indicates that the biological fidelity does not reach the Shannon optimal distortion. However, the relationship between the channel capacity and rate distortion achieved for different biological domains provides tremendous insight into the dynamics of the evolutionary processes of the three domains of life. We rely on these results to provide a model of genome sequence evolution based on the two major evolutionary driving forces: mutations and unequal crossovers.  相似文献   

17.
Molecular networks represent the backbone of molecular activity within the cell. Recent studies have taken a comparative approach toward interpreting these networks, contrasting networks of different species and molecular types, and under varying conditions. In this review, we survey the field of comparative biological network analysis and describe its applications to elucidate cellular machinery and to predict protein function and interaction. We highlight the open problems in the field as well as propose some initial mathematical formulations for addressing them. Many of the methodological and conceptual advances that were important for sequence comparison will likely also be important at the network level, including improved search algorithms, techniques for multiple alignment, evolutionary models for similarity scoring and better integration with public databases.  相似文献   

18.
We propose an epidemic model for the transmission of hepatitis B virus along with the classification of different infection phases and hospitalized class. We formulate the model and discuss its basic mathematical properties, e.g. existence, positivity, and biological feasibility. Exploiting the next generation matrix approach, we find the basic reproductive number of the model. We perform sensitivity analysis to illustrate the effect of various parameters on the transmission of the disease. We investigate stability of the equilibria of the model in terms of the basic reproduction number. Conditions for the stability of the proposed model are obtained using various approaches. Finally, we perform the numerical simulations to discuss sensitivity analysis and to support our analytical work.  相似文献   

19.
This paper proposes a new methodology for the automated design of cell models for systems and synthetic biology. Our modelling framework is based on P systems, a discrete, stochastic and modular formal modelling language. The automated design of biological models comprising the optimization of the model structure and its stochastic kinetic constants is performed using an evolutionary algorithm. The evolutionary algorithm evolves model structures by combining different modules taken from a predefined module library and then it fine-tunes the associated stochastic kinetic constants. We investigate four alternative objective functions for the fitness calculation within the evolutionary algorithm: (1) equally weighted sum method, (2) normalization method, (3) randomly weighted sum method, and (4) equally weighted product method. The effectiveness of the methodology is tested on four case studies of increasing complexity including negative and positive autoregulation as well as two gene networks implementing a pulse generator and a bandwidth detector. We provide a systematic analysis of the evolutionary algorithm’s results as well as of the resulting evolved cell models.  相似文献   

20.
A number of metrics have been developed for estimating phylogenetic signal in data and to evaluate correlated evolution, inferring broad-scale evolutionary and ecological processes. Here, we proposed an approach called phylogenetic signal-representation (PSR) curve, built upon phylogenetic eigenvector regression (PVR). In PVR, selected eigenvectors extracted from a phylogenetic distance matrix are used to model interspecific variation. In the PSR curve, sequential PVR models are fitted after successively increasing the number of eigenvectors and plotting their R(2) against the accumulated eigenvalues. We used simulations to show that a linear PSR curve is expected under Brownian motion and that its shape changes under alternative evolutionary models. The PSR area, expressing deviations from Brownian motion, is strongly correlated (r= 0.873; P < 0.01) with Blomberg's K-statistics, so nonlinear PSR curves reveal if traits are evolving at a slower or higher rate than expected by Brownian motion. The PSR area is also correlated with phylogenetic half-life under an Ornstein-Uhlenbeck process, suggesting how both methods describe the shape of the relationship between interspecific variation and time since divergence among species. The PSR curve provides an elegant exploratory method to understand deviations from Brownian motion, in terms of acceleration or deceleration of evolutionary rates occurring at large or small phylogenetic distances.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号