共查询到20条相似文献,搜索用时 15 毫秒
1.
He YJ Huynh TN Jansson J Sung WK 《Journal of bioinformatics and computational biology》2006,4(1):59-74
To construct a phylogenetic tree or phylogenetic network for describing the evolutionary history of a set of species is a well-studied problem in computational biology. One previously proposed method to infer a phylogenetic tree/network for a large set of species is by merging a collection of known smaller phylogenetic trees on overlapping sets of species so that no (or as little as possible) branching information is lost. However, little work has been done so far on inferring a phylogenetic tree/network from a specified set of trees when in addition, certain evolutionary relationships among the species are known to be highly unlikely. In this paper, we consider the problem of constructing a phylogenetic tree/network which is consistent with all of the rooted triplets in a given set C and none of the rooted triplets in another given set F. Although NP-hard in the general case, we provide some efficient exact and approximation algorithms for a number of biologically meaningful variants of the problem. 相似文献
2.
Inferring functional relationships of proteins from local sequence and spatial surface patterns 总被引:2,自引:0,他引:2
We describe a novel approach for inferring functional relationship of proteins by detecting sequence and spatial patterns of protein surfaces. Well-formed concave surface regions in the form of pockets and voids are examined to identify similarity relationship that might be directly related to protein function. We first exhaustively identify and measure analytically all 910,379 surface pockets and interior voids on 12,177 protein structures from the Protein Data Bank. The similarity of patterns of residues forming pockets and voids are then assessed in sequence, in spatial arrangement, and in orientational arrangement. Statistical significance in the form of E and p-values is then estimated for each of the three types of similarity measurements. Our method is fully automated without human intervention and can be used without input of query patterns. It does not assume any prior knowledge of functional residues of a protein, and can detect similarity based on surface patterns small and large. It also tolerates, to some extent, conformational flexibility of functional sites. We show with examples that this method can detect functional relationship with specificity for members of the same protein family and superfamily, as well as remotely related functional surfaces from proteins of different fold structures. We envision that this method can be used for discovering novel functional relationship of protein surfaces, for functional annotation of protein structures with unknown biological roles, and for further inquiries on evolutionary origins of structural elements important for protein function. 相似文献
3.
4.
Leventhal GE Kouyos R Stadler T Wyl Vv Yerly S Böni J Cellerai C Klimkait T Günthard HF Bonhoeffer S 《PLoS computational biology》2012,8(3):e1002413
Contact structure is believed to have a large impact on epidemic spreading and consequently using networks to model such contact structure continues to gain interest in epidemiology. However, detailed knowledge of the exact contact structure underlying real epidemics is limited. Here we address the question whether the structure of the contact network leaves a detectable genetic fingerprint in the pathogen population. To this end we compare phylogenies generated by disease outbreaks in simulated populations with different types of contact networks. We find that the shape of these phylogenies strongly depends on contact structure. In particular, measures of tree imbalance allow us to quantify to what extent the contact structure underlying an epidemic deviates from a null model contact network and illustrate this in the case of random mixing. Using a phylogeny from the Swiss HIV epidemic, we show that this epidemic has a significantly more unbalanced tree than would be expected from random mixing. 相似文献
5.
Gertz J Elfond G Shustrova A Weisinger M Pellegrini M Cokus S Rothschild B 《Bioinformatics (Oxford, England)》2003,19(16):2039-2045
Finding the interacting pairs of proteins between two different protein families whose members are known to interact is an important problem in molecular biology. We developed and tested an algorithm that finds optimal matches between two families of proteins by comparing their distance matrices. A distance matrix provides a measure of the sequence similarity of proteins within a family. Since the protein sets of interest may have dozens of proteins each, the use of an efficient approximate solution is necessary. Therefore the approach we have developed consists of a Metropolis Monte Carlo optimization algorithm which explores the search space of possible matches between two distance matrices. We demonstrate that by using this algorithm we are able to accurately match chemokines and chemokine-receptors as well as the tgfbeta family of ligands and their receptors. 相似文献
6.
Reduced-representation genome sequencing represents a new source of data for systematics, and its potential utility in interspecific phylogeny reconstruction has not yet been explored. One approach that seems especially promising is the use of inexpensive short-read technologies (e.g., Illumina, SOLiD) to sequence restriction-site associated DNA (RAD)--the regions of the genome that flank the recognition sites of restriction enzymes. In this study, we simulated the collection of RAD sequences from sequenced genomes of different taxa (Drosophila, mammals, and yeasts) and developed a proof-of-concept workflow to test whether informative data could be extracted and used to accurately reconstruct "known" phylogenies of species within each group. The workflow consists of three basic steps: first, sequences are clustered by similarity to estimate orthology; second, clusters are filtered by taxonomic coverage; and third, they are aligned and concatenated for "total evidence" phylogenetic analysis. We evaluated the performance of clustering and filtering parameters by comparing the resulting topologies with well-supported reference trees and we were able to identify conditions under which the reference tree was inferred with high support. For Drosophila, whole genome alignments allowed us to directly evaluate which parameters most consistently recovered orthologous sequences. For the parameter ranges explored, we recovered the best results at the low ends of sequence similarity and taxonomic representation of loci; these generated the largest supermatrices with the highest proportion of missing data. Applications of the method to mammals and yeasts were less successful, which we suggest may be due partly to their much deeper evolutionary divergence times compared to Drosophila (crown ages of approximately 100 and 300 versus 60 Mya, respectively). RAD sequences thus appear to hold promise for reconstructing phylogenetic relationships in younger clades in which sufficient numbers of orthologous restriction sites are retained across species. 相似文献
7.
基于核内核糖体小亚基序列的蝗总科系统发育关系分析 总被引:6,自引:3,他引:6
用核糖体SSURdna全序列对蝗总科(Acridoidea)进行了分子系统学研究。依据测定的8种蝗虫的SSU Rdna全序列 (平均 1.844 bp),并从GenBank中选取了6种内群种类和2种外群种类的SSU Rdna同源序列,进行序列分析。利用Clustal、MEGA 和 PHYLIP 软件构建分子系统树(距离邻接法Neighbor-Joining,NJ;最小进化法 Minimum Evolution)。结果显示: (1) 蝗总科是一个单系类群;(2) 锥头蝗科(Chrotogonidae)和瘤锥蝗科(Pyrgomorphidea)亲缘关系较近,为蝗总科最原始的类群;(3) 网翅蝗科(Arcypteridae)和槌角蝗科(Gomphoceridae)有较近的亲缘关系; (4) 斑翅蝗科 (Oedipodidae)为最进化的类群; (5) SSU Rdna序列保守性强,转换transition)取代的速率大于或接近颠换(transversion)取代的速率;(6) 在系统树中,总科首先分离,大多数同科不同属的类群以高置信度聚合在一起,说明SSU Rdna序列适合用于蝗总科的系统发育关系分析。 相似文献
8.
《Molecular phylogenetics and evolution》2013,66(1):215-222
The pelicans are a charismatic group of large water birds, whose evolutionary relationships have been long debated. Here we use DNA sequence data from both mitochondrial and nuclear genes to derive a robust phylogeny of all the extant species. Our data rejects the widespread notion that pelicans can be divided into white- and brown-plumaged groups. Instead, we find that, in contrast to all previous evolutionary hypotheses, the species fall into three well-supported clades: an Old World clade of the Dalmatian, Spot-billed, Pink-backed and Australian Pelicans, a New World clade of the American White, Brown and Peruvian Pelicans, and monospecific clade consisting solely of the Great White Pelican, weakly grouped with the Old World clade. We discuss possible evolutionary scenarios giving rise to this diversity. 相似文献
9.
10.
Phylogenies of highly genetically variable viruses such as HIV-1 are potentially informative of epidemiological dynamics. Several studies have demonstrated the presence of clusters of highly related HIV-1 sequences, particularly among recently HIV-infected individuals, which have been used to argue for a high transmission rate during acute infection. Using a large set of HIV-1 subtype B pol sequences collected from men who have sex with men, we demonstrate that virus from recent infections tend to be phylogenetically clustered at a greater rate than virus from patients with chronic infection ('excess clustering') and also tend to cluster with other recent HIV infections rather than chronic, established infections ('excess co-clustering'), consistent with previous reports. To determine the role that a higher infectivity during acute infection may play in excess clustering and co-clustering, we developed a simple model of HIV infection that incorporates an early period of intensified transmission, and explicitly considers the dynamics of phylogenetic clusters alongside the dynamics of acute and chronic infected cases. We explored the potential for clustering statistics to be used for inference of acute stage transmission rates and found that no single statistic explains very much variance in parameters controlling acute stage transmission rates. We demonstrate that high transmission rates during the acute stage is not the main cause of excess clustering of virus from patients with early/acute infection compared to chronic infection, which may simply reflect the shorter time since transmission in acute infection. Higher transmission during acute infection can result in excess co-clustering of sequences, while the extent of clustering observed is most sensitive to the fraction of infections sampled. 相似文献
11.
Using DNA sequence data from pathogens to infer transmission networks has traditionally been done in the context of epidemics and outbreaks. Sequence data could analogously be applied to cases of ubiquitous commensal bacteria; however, instead of inferring chains of transmission to track the spread of a pathogen, sequence data for bacteria circulating in an endemic equilibrium could be used to infer information about host contact networks. Here, we show--using simulated data--that multilocus DNA sequence data, based on multilocus sequence typing schemes (MLST), from isolates of commensal bacteria can be used to infer both local and global properties of the contact networks of the populations being sampled. Specifically, for MLST data simulated from small-world networks, the small world parameter controlling the degree of structure in the contact network can robustly be estimated. Moreover, we show that pairwise distances in the network--degrees of separation--correlate with genetic distances between isolates, so that how far apart two individuals in the network are can be inferred from MLST analysis of their commensal bacteria. This result has important consequences, and we show an example from epidemiology: how this result could be used to test for infectious origins of diseases of unknown etiology. 相似文献
12.
Silvano Presciuttini Chiara Toni Elena Tempestini Simonetta Verdiani Lucia Casarino Isabella Spinetti Francesco De Stefano Ranieri Domenici Joan E Bailey-Wilson 《BMC genetics》2002,3(1):23-11
Background
The traditional exact method for inferring relationships between individuals from genetic data is not easily applicable in all situations that may be encountered in several fields of applied genetics. This study describes an approach that gives affordable results and is easily applicable; it is based on the probabilities that two individuals share 0, 1 or both alleles at a locus identical by state. 相似文献13.
MOTIVATION: Time series expression experiments have emerged as a popular method for studying a wide range of biological systems under a variety of conditions. One advantage of such data is the ability to infer regulatory relationships using time lag analysis. However, such analysis in a single experiment may result in many false positives due to the small number of time points and the large number of genes. Extending these methods to simultaneously analyze several time series datasets is challenging since under different experimental conditions biological systems may behave faster or slower making it hard to rely on the actual duration of the experiment. RESULTS: We present a new computational model and an associated algorithm to address the problem of inferring time-lagged regulatory relationships from multiple time series expression experiments with varying (unknown) time-scales. Our proposed algorithm uses a set of known interacting pairs to compute a temporal transformation between every two datasets. Using this temporal transformation we search for new interacting pairs. As we show, our method achieves a much lower false-positive rate compared to previous methods that use time series expression data for pairwise regulatory relationship discovery. Some of the new predictions made by our method can be verified using other high throughput data sources and functional annotation databases. AVAILABILITY: Matlab implementation is available from the supporting website: http://www.cs.cmu.edu/~yanxins/regulation_inference/index.html. 相似文献
14.
We present a mathematical method for inferring the dynamics of gene expression from time series of reporter protein assays and cell populations. We show that estimating temporal expression dynamics from direct visual inspection of reporter protein data is unreliable when the half-life of the protein is comparable to the time scale of the expression dynamics. Our method is simple and general because it is designed only to reconstruct the pattern of protein synthesis, without assuming any specific regulatory mechanisms. It can be applied to a wide range of cell types, patterns of expression, and reporter systems, and is implemented in publicly available spreadsheets. We show that our method is robust to a several possible types of error, and argue that uncertainty about the decay kinetics of reporter proteins is the limiting factor in reconstructing the temporal pattern of gene expression dynamics from reporter protein assays. With improved estimates of reporter protein decay rates, our approach could allow for detailed reconstruction of gene expression dynamics from commonly used reporter protein systems. 相似文献
15.
Phylogenetic trees can be rooted by a number of criteria. Here, we introduce a Bayesian method for inferring the root of a phylogenetic tree by using one of several criteria: the outgroup, molecular clock, and nonreversible model of DNA substitution. We perform simulation analyses to examine the relative ability of these three criteria to correctly identify the root of the tree. The outgroup and molecular clock criteria were best able to identify the root of the tree, whereas the nonreversible model was able to identify the root only when the substitution process was highly nonreversible. We also examined the performance of the criteria for a tree of four species for which the topology and root position are well supported. Results of the analyses of these data are consistent with the simulation results. 相似文献
16.
Ancient phylogenetic relationships 总被引:10,自引:0,他引:10
Traditional views on deep evolutionary events have been seriously challenged over the last few years, following the identification of major pitfalls affecting molecular phylogeny reconstruction. Here we describe the principally encountered artifacts, notably long branch attraction, and their causes (i.e., difference in evolutionary rates, mutational saturation, compositional biases). Additional difficulties due to phenomena of biological nature (i.e., lateral gene transfer, recombination, hidden paralogy) are also discussed. Moreover, contrary to common beliefs, we show that the use of rare genomic events can also be misleading and should be treated with the same caution as standard molecular phylogeny. The universal tree of life, as described in most textbooks, is partly affected by tree reconstruction artifacts, e.g. (i) the bacterial rooting of the universal tree of life; (ii) the early emergence of amitochondriate lineages in eukaryotic phylogenies; and (iii) the position of hyperthermophilic taxa in bacterial phylogenies. We present an alternative view of this tree, based on recent evidence obtained from reanalyses of ancient data sets and from novel analyses of large combination of genes. 相似文献
17.
Background
Phylogenetic methods are philosophically grounded, and so can be philosophically biased in ways that limit explanatory power. This constitutes an important methodologic dimension not often taken into account. Here we address this dimension in the context of concatenation approaches to phylogeny. 相似文献18.
19.
20.
Spatial interactions are key determinants in the dynamics of many epidemiological and ecological systems; therefore it is
important to use spatio-temporal models to estimate essential parameters. However, spatially-explicit data sets are rarely
available; moreover, fitting spatially-explicit models to such data can be technically demanding and computationally intensive.
Thus non-spatial models are often used to estimate parameters from temporal data. We introduce a method for fitting models
to temporal data in order to estimate parameters which characterise spatial epidemics. The method uses semi-spatial models
and pair approximation to take explicit account of spatial clustering of disease without requiring spatial data. The approach is demonstrated for
data from experiments with plant populations invaded by a common soilborne fungus, Rhizoctonia solani. Model inferences concerning the number of sources of disease and primary and secondary infections are tested against independent
measures from spatio-temporal data. The applicability of the method to a wide range of host-pathogen systems is discussed. 相似文献