期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

On the impossibility of reconstructing ancestral data and phylogenies.

Elchanan Mossel 《Journal of computational biology》2003,10(5):669-676

We prove that it is impossible to reconstruct ancestral data at the root of "deep" phylogenetic trees with high mutation rates. Moreover, we prove that it is impossible to reconstruct the topology of "deep" trees with high mutation rates from a number of characters smaller than a low-degree polynomial in the number of leaves. Our impossibility results hold for all reconstruction methods. The proofs apply tools from information theory and percolation theory. 相似文献

2.

BitPhylogeny: a probabilistic framework for reconstructing intra-tumor phylogenies

Ke Yuan Thomas Sakoparnig Florian Markowetz Niko Beerenwinkel 《Genome biology》2015,16(1)

Cancer has long been understood as a somatic evolutionary process, but many details of tumor progression remain elusive. Here, we present BitPhylogeny, a probabilistic framework to reconstruct intra-tumor evolutionary pathways. Using a full Bayesian approach, we jointly estimate the number and composition of clones in the sample as well as the most likely tree connecting them. We validate our approach in the controlled setting of a simulation study and compare it against several competing methods. In two case studies, we demonstrate how BitPhylogeny reconstructs tumor phylogenies from methylation patterns in colon cancer and from single-cell exomes in myeloproliferative neoplasm.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-015-0592-6) contains supplementary material, which is available to authorized users. 相似文献

3.

An effective data mining technique for reconstructing gene regulatory networks from time series expression data

Ma PC Chan KC 《Journal of bioinformatics and computational biology》2007,5(3):651-668

Recent development in DNA microarray technologies has made the reconstruction of gene regulatory networks (GRNs) feasible. To infer the overall structure of a GRN, there is a need to find out how the expression of each gene can be affected by the others. Many existing approaches to reconstructing GRNs are developed to generate hypotheses about the presence or absence of interactions between genes so that laboratory experiments can be performed afterwards for verification. Since, they are not intended to be used to predict if a gene in an unseen sample has any interactions with other genes, statistical verification of the reliability of the discovered interactions can be difficult. Furthermore, since the temporal ordering of the data is not taken into consideration, the directionality of regulation cannot be established using these existing techniques. To tackle these problems, we propose a data mining technique here. This technique makes use of a probabilistic inference approach to uncover interesting dependency relationships in noisy, high-dimensional time series expression data. It is not only able to determine if a gene is dependent on another but also whether or not it is activated or inhibited. In addition, it can predict how a gene would be affected by other genes even in unseen samples. For performance evaluation, the proposed technique has been tested with real expression data. Experimental results show that it can be very effective. The discovered dependency relationships can reveal gene regulatory relationships that could be used to infer the structures of GRNs. 相似文献

4.

Using jackknife to assess the quality of gene order phylogenies

Jian Shi Yiwei Zhang Haiwei Luo Jijun Tang 《BMC bioinformatics》2010,11(1):168

Background

In recent years, gene order data has attracted increasing attention from both biologists and computer scientists as a new type of data for phylogenetic analysis. If gene orders are viewed as one character with a large number of states, traditional bootstrap procedures cannot be applied. Researchers began to use a jackknife resampling method to assess the quality of gene order phylogenies. 相似文献

5.

Inferring phylogenies from RAD sequence data

Rubin BE Ree RH Moreau CS 《PloS one》2012,7(4):e33394

Reduced-representation genome sequencing represents a new source of data for systematics, and its potential utility in interspecific phylogeny reconstruction has not yet been explored. One approach that seems especially promising is the use of inexpensive short-read technologies (e.g., Illumina, SOLiD) to sequence restriction-site associated DNA (RAD)--the regions of the genome that flank the recognition sites of restriction enzymes. In this study, we simulated the collection of RAD sequences from sequenced genomes of different taxa (Drosophila, mammals, and yeasts) and developed a proof-of-concept workflow to test whether informative data could be extracted and used to accurately reconstruct "known" phylogenies of species within each group. The workflow consists of three basic steps: first, sequences are clustered by similarity to estimate orthology; second, clusters are filtered by taxonomic coverage; and third, they are aligned and concatenated for "total evidence" phylogenetic analysis. We evaluated the performance of clustering and filtering parameters by comparing the resulting topologies with well-supported reference trees and we were able to identify conditions under which the reference tree was inferred with high support. For Drosophila, whole genome alignments allowed us to directly evaluate which parameters most consistently recovered orthologous sequences. For the parameter ranges explored, we recovered the best results at the low ends of sequence similarity and taxonomic representation of loci; these generated the largest supermatrices with the highest proportion of missing data. Applications of the method to mammals and yeasts were less successful, which we suggest may be due partly to their much deeper evolutionary divergence times compared to Drosophila (crown ages of approximately 100 and 300 versus 60 Mya, respectively). RAD sequences thus appear to hold promise for reconstructing phylogenetic relationships in younger clades in which sufficient numbers of orthologous restriction sites are retained across species. 相似文献

6.

Bootstrapping phylogenies inferred from rearrangement data

Y Lin V Rajan BM Moret 《Algorithms for molecular biology : AMB》2012,7(1):21-10

ABSTRACT: Large-scale sequencing of genomes has enabled the inference of phylogenies based on the evolution of genomic architecture, under such events as rearrangements, duplications, and losses. Many evolutionary models and associated algorithms have been designed over the last few years and have found use in comparative genomics and phylogenetic inference. However, the assessment of phylogenies built from such data has not been properly addressed to date. The standard method used in sequence-based phylogenetic inference is the bootstrap, but it relies on a large number of homologous characters that can be resampled; yet in the case of rearrangements, the entire genome is a single character. Alternatives such as the jackknife suffer from the same problem, while likelihood tests cannot be applied in the absence of well established probabilistic models. We present a new approach to the assessment of distance-based phylogenetic inference from whole-genome data; our approach combines features of the jackknife and the bootstrap and remains nonparametric. For each feature of our method, we give an equivalent feature in the sequence-based framework; we also present the results of extensive experimental testing, in both sequence-based and genome-based frameworks. Through the feature-by-feature comparison and the experimental results, we show that our bootstrapping approach is on par with the classic phylogenetic bootstrap used in sequence-based reconstruction, and we establish the clear superiority of the classic bootstrap for sequence data and of our corresponding new approach for rearrangement data over proposed variants. Finally, we test our approach on a small dataset of mammalian genomes, verifying that the support values match current thinking about the respective branches. Our method is the first to provide a standard of assessment to match that of the classic phylogenetic bootstrap for aligned sequences. Its support values follow a similar scale and its receiver-operating characteristics are nearly identical, indicating that it provides similar levels of sensitivity and specificity. Thus our assessment method makes it possible to conduct phylogenetic analyses on whole genomes with the same degree of confidence as for analyses on aligned sequences. Extensions to search-based inference methods such as maximum parsimony and maximum likelihood are possible, but remain to be thoroughly tested. 相似文献

7.

Inferring polyploid phylogenies from multiply-labeled gene trees

Martin Lott Andreas Spillner Katharina T Huber Anna Petri Bengt Oxelman Vincent Moulton 《BMC evolutionary biology》2009,9(1):216

Background

Gene trees that arise in the context of reconstructing the evolutionary history of polyploid species are often multiply-labeled, that is, the same leaf label can occur several times in a single tree. This property considerably complicates the task of forming a consensus of a collection of such trees compared to usual phylogenetic trees. 相似文献

8.

OrthoParaMap: Distinguishing orthologs from paralogs by integrating comparative genome data and gene phylogenies

Steven?B?Cannon Email author Nevin?D?Young 《BMC bioinformatics》2003,4(1):35

Background

In eukaryotic genomes, most genes are members of gene families. When comparing genes from two species, therefore, most genes in one species will be homologous to multiple genes in the second. This often makes it difficult to distinguish orthologs (separated through speciation) from paralogs (separated by other types of gene duplication). Combining phylogenetic relationships and genomic position in both genomes helps to distinguish between these scenarios. This kind of comparison can also help to describe how gene families have evolved within a single genome that has undergone polyploidy or other large-scale duplications, as in the case of Arabidopsis thaliana – and probably most plant genomes. 相似文献

9.

Shortest triplet clustering: reconstructing large phylogenies using representative sets

Le?Sy Vinh Arndt?von Haeseler Email author 《BMC bioinformatics》2005,6(1):92

Background

Understanding the evolutionary relationships among species based on their genetic information is one of the primary objectives in phylogenetic analysis. Reconstructing phylogenies for large data sets is still a challenging task in Bioinformatics. 相似文献

10.

Analysis of recursive gene selection approaches from microarray data 总被引：1，自引：0，他引：1

Li F Yang Y 《Bioinformatics (Oxford, England)》2005,21(19):3741-3747

MOTIVATION: Finding a small subset of most predictive genes from microarray for disease prediction is a challenging problem. Support vector machines (SVMs) have been found to be successful with a recursive procedure in selecting important genes for cancer prediction. However, it is not well understood how much of the success depends on the choice of the specific classifier and how much on the recursive procedure. We answer this question by examining multiple classifers [SVM, ridge regression (RR) and Rocchio] with feature selection in recursive and non-recursive settings on three DNA microarray datasets (ALL-AML Leukemia data, Breast Cancer data and GCM data). RESULTS: We found recursive RR most effective. On the AML-ALL dataset, it achieved zero error rate on the test set using only three genes (selected from over 7000), which is more encouraging than the best published result (zero error rate using 8 genes by recursive SVM). On the Breast Cancer dataset and the two largest categories of the GCM dataset, the results achieved by recursive RR are also very encouraging. A further analysis of the experimental results shows that different classifiers penalize redundant features to different extent and this property plays an important role in the recursive feature selection process. RR classifier tends to penalize redundant features to a much larger extent than the SVM does. This may be the reason why recursive RR has a better performance in selecting genes. 相似文献

11.

Computational complexity of inferring phylogenies from chromosome inversion data

W H Day D Sankoff 《Journal of theoretical biology》1987,124(2):213-218

In systematics, parsimony methods construct phylogenies, or evolutionary trees, in which characters evolve with the least evolutionary change. The chromosome inversion, or polymorphism, parsimony criterion is used when each character of a population may exhibit homozygous or heterozygous states, but when the heterozygous state must evolve uniquely. Variations of the criterion concern whether or not the ancestral states of characters are specified. We establish that problems of inferring phylogenies by these criteria are NP-complete and thus are so difficult computationally that efficient optimal algorithms for them are unlikely to exist. 相似文献

12.

A statistical test of phylogenies estimated from sequence data 总被引：4，自引：0，他引：4

W H Li 《Molecular biology and evolution》1989,6(4):424-435

A simple approach to testing the significance of the branching order, estimated from protein or DNA sequence data, of three taxa is proposed. The branching order is inferred by the transformed-distance method, under the assumption that one or two outgroups are available, and the branch lengths are estimated by the least-squares method. The inferred branching order is considered significant if the estimated internodal distance is significantly greater than zero. To test this, a formula for the variance of the internodal distance has been developed. The statistical test proposed has been checked by computer simulation. The same test also applies to the case of four taxa with no outgroup, if one considers an unrooted tree. Formulas for the variances of internodal distances have also been developed for the case of five taxa. Conditions are given under which it is more efficient to add the sequence of a fifth taxon than to do 25% more nucleotide sequencing in each of the original four. A method is presented for combining analyses of disparate data to get a single P value. Finally, the test, applied to the human-chimpanzee-gorilla problem, shows that the issue is not yet resolved. 相似文献

13.

A simple graphic method for reconstructing phylogenetic trees from molecular data

F Tajima 《Molecular biology and evolution》1990,7(6):578-588

A simple graphic method is proposed for reconstructing phylogenetic trees from molecular data. This method is similar to the unweighted pair-group method with arithmetic mean, but the process of computation of average distances and reconstruction of new matrices, required in the latter method, is eliminated from this new method, so that one can reconstruct a phylogenetic tree without using a computer, unless the number of operational taxonomic units is very large. Furthermore, this method allows a phylogenetic tree to have multifurcating branches whenever there is ambiguity with bifurcation. 相似文献

14.

New coupled model used inversely for reconstructing past terrestrial carbon storage from pollen data: validation of model using modern data

HAIBIN WU †‡ JOËL GUIOT† CHANGHUI PENG‡§ ZHENGTANG GUO¶ 《Global Change Biology》2009,15(1):82-96

The knowledge of potential impacts of climate change on terrestrial vegetation is crucial to understand long-term global carbon cycle development. Discrepancy in data has long existed between past carbon storage reconstructions since the Last Glacial Maximum by way of pollen, carbon isotopes, and general circulation model (GCM) analysis. This may be due to the fact that these methods do not synthetically take into account significant differences in climate distribution between modern and past conditions, as well as the effects of atmospheric CO₂ concentrations on vegetation. In this study, a new method to estimate past biospheric carbon stocks is reported, utilizing a new integrated ecosystem model (PCM) built on a physiological process vegetation model (BIOME4) coupled with a process-based biospheric carbon model (DEMETER). The PCM was constrained to fit pollen data to obtain realistic estimates. It was estimated that the probability distribution of climatic parameters, as simulated by BIOME4 in an inverse process, was compatible with pollen data while DEMETER successfully simulated carbon storage values with corresponding outputs of BIOME4. The carbon model was validated with present-day observations of vegetation biomes and soil carbon, and the inversion scheme was tested against 1491 surface pollen spectra sample sites procured in Africa and Eurasia. Results show that this method can successfully simulate biomes and related climates at most selected pollen sites, providing a coefficient of determination ( R ) of 0.83–0.97 between the observed and reconstructed climates, while also showing a consensus with an R -value of 0.90–0.96 between the simulated biome average terrestrial carbon variables and the available observations. The results demonstrate the reliability and feasibility of the climate reconstruction method and its potential efficiency in reconstructing past terrestrial carbon storage. 相似文献

15.

Clustering approaches to identifying gene expression patterns from DNA microarray data

Do JH Choi DK 《Molecules and cells》2008,25(2):279-288

相似文献

16.

Improving animal phylogenies with genomic data

Telford MJ Copley RR 《Trends in genetics : TIG》2011,27(5):186-195

Since the first animal genomes were completely sequenced ten years ago, evolutionary biologists have attempted to use the encoded information to reconstruct different aspects of the earliest stages of animal evolution. One of the most important uses of genome sequences is to understand relationships between animal phyla. Despite the wealth of data available, ranging from primary sequence data to gene and genome structures, our lack of understanding of the modes of evolution of genomic characters means that using these data is fraught with potential difficulties, leading to errors in phylogeny reconstruction. Improved understanding of how different character types evolve, the use of this knowledge to develop more accurate models of evolution, and denser taxonomic sampling, are now minimizing the sources of error. The wealth of genomic data now being produced promises that a well-resolved tree of the animal phyla will be available in the near future. 相似文献

17.

A Bayesian compressed-sensing approach for reconstructing neural connectivity from subsampled anatomical data

Mishchenko Y Paninski L 《Journal of computational neuroscience》2012,33(2):371-388

In recent years, the problem of reconstructing the connectivity in large neural circuits ("connectomics") has re-emerged as one of the main objectives of neuroscience. Classically, reconstructions of neural connectivity have been approached anatomically, using electron or light microscopy and histological tracing methods. This paper describes a statistical approach for connectivity reconstruction that relies on relatively easy-to-obtain measurements using fluorescent probes such as synaptic markers, cytoplasmic dyes, transsynaptic tracers, or activity-dependent dyes. We describe the possible design of these experiments and develop a Bayesian framework for extracting synaptic neural connectivity from such data. We show that the statistical reconstruction problem can be formulated naturally as a tractable L (1)-regularized quadratic optimization. As a concrete example, we consider a realistic hypothetical connectivity reconstruction experiment in C. elegans, a popular neuroscience model where a complete wiring diagram has been previously obtained based on long-term electron microscopy work. We show that the new statistical approach could lead to an orders of magnitude reduction in experimental effort in reconstructing the connectivity in this circuit. We further demonstrate that the spatial heterogeneity and biological variability in the connectivity matrix-not just the "average" connectivity-can also be estimated using the same method. 相似文献

18.

Comparative evaluation of gene set analysis approaches for RNA-Seq data

Yasir Rahmatallah Frank Emmert-Streib Galina Glazko 《BMC bioinformatics》2014,15(1)

相似文献

19.

ORIOGEN: order restricted inference for ordered gene expression data

Peddada S Harris S Zajd J Harvey E 《Bioinformatics (Oxford, England)》2005,21(20):3933-3934

SUMMARY: ORIOGEN is a user-friendly Java-based software package for selecting and clustering genes according to their profiles across various treatment groups. In particular, ORIOGEN is useful for analyzing data obtained from time-course or dose-response type experiments. AVAILABILITY: The ORIOGEN software can be downloaded freely from http://dir.niehs.nih.gov/dirbb/oriogen/index.cfm CONTACT: peddada@niehs.nih.gov (for statistical questions) and oriogen@constellagroup.com (for software support) SUPPLEMENTARY INFORMATION: ORIOGEN has a full set of help files. Also, an example input file is provided with the download. 相似文献

20.

Reconstructing phylogenies from allozyme data: comparing method performance with congruence

JOHN J. WIENS 《Biological journal of the Linnean Society. Linnean Society of London》2000,70(4):613-632

Allozyme data are widely used to infer the phylogenies of populations and closely-related species. Numerous parsimony, distance, and likelihood methods have been proposed for phylogenetic analysis of these data; the relative merits of these methods have been debated vigorously, but their accuracy has not been well explored. In this study, I compare the performance of 13 phylogenetic methods (six parsimony, six distance, and continuous maximum likelihood) by applying a congruence approach to eight allozyme data sets from the literature. Clades are identified that are supported by multiple data sets other than allozymes (e.g. morphology, DNA sequences), and the ability of different methods to recover these 'known' clades is compared. The results suggest that (1) distance and likelihood methods generally outperform parsimony methods, (2) methods that utilize frequency data tend to perform well, and (3) continuous maximum likelihood is among the most accurate methods, and appears to be robust to violations of its assumptions. These results are in agreement with those from recent simulation studies, and help provide a basis for empirical workers to choose among the many methods available for analysing allozyme characters. 相似文献