首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 26 毫秒
1.
Distance-based reconstruction of tree models for oncogenesis.   总被引:4,自引:0,他引:4  
Comparative genomic hybridization (CGH) is a laboratory method to measure gains and losses in the copy number of chromosomal regions in tumor cells. It is hypothesized that certain DNA gains and losses are related to cancer progression and that the patterns of these changes are relevant to the clinical consequences of the cancer. It is therefore of interest to develop models which predict the occurrence of these events, as well as techniques for learning such models from CGH data. We continue our study of the mathematical foundations for inferring a model of tumor progression from a CGH data set that we started in Desper et al. (1999). In that paper, we proposed a class of probabilistic tree models and showed that an algorithm based on maximum-weight branching in a graph correctly infers the topology of the tree, under plausible assumptions. In this paper, we extend that work in the direction of the so-called distance-based trees, in which events are leaves of the tree, in the style of models common in phylogenetics. Then we show how to reconstruct the distance-based trees using tree-fitting algorithms developed by researchers in phylogenetics. The main advantages of the distance-based models are that 1) they represent information about co-occurrences of all pairs of events, instead of just some pairs, 2) they allow quantitative predictions about which events occur early in tumor progression, and 3) they bring into play the extensive methodology and software developed in the context of phylogenetics. We illustrate the distance-based tree method and how it complements the branching tree method, with a CGH data set for renal cancer.  相似文献   

2.
MOTIVATION: Accurate time series for biological processes are difficult to estimate due to problems of synchronization, temporal sampling and rate heterogeneity. Methods are needed that can utilize multi-dimensional data, such as those resulting from DNA microarray experiments, in order to reconstruct time series from unordered or poorly ordered sets of observations. RESULTS: We present a set of algorithms for estimating temporal orderings from unordered sets of sample elements. The techniques we describe are based on modifications of a minimum-spanning tree calculated from a weighted, undirected graph. We demonstrate the efficacy of our approach by applying these techniques to an artificial data set as well as several gene expression data sets derived from DNA microarray experiments. In addition to estimating orderings, the techniques we describe also provide useful heuristics for assessing relevant properties of sample datasets such as noise and sampling intensity, and we show how a data structure called a PQ-tree can be used to represent uncertainty in a reconstructed ordering. AVAILABILITY: Academic implementations of the ordering algorithms are available as source code (in the programming language Python) on our web site, along with documentation on their use. The artificial 'jelly roll' data set upon which the algorithm was tested is also available from this web site. The publicly available gene expression data may be found at http://genome-www.stanford.edu/cellcycle/ and http://caulobacter.stanford.edu/CellCycle/.  相似文献   

3.
This study targeted the development of a novel microarray tool to allow rapid determination of the expression levels of 58 different tyrosine kinase (tk) genes in small tumor samples. The goals were to define a reference probe for multi-sample comparison and to investigate the variability and reproducibility of the image acquisition and RT-PCR procedures. The small number of tk genes on our arrays enabled us to define a reference probe by artificially mixing all genes on the arrays. Such a probe provided contrast reference for comparative hybridization of control and sample DNA and enabled cross-comparison of more than two samples against one another. Comparison of signals generated from multiple scanning eliminated the concern of photo bleaching and scanner intrinsic noise. Tests performed with breast, thyroid, and prostate cancer samples yielded distinctive patterns and suggest the feasibility of our approach. Repeated experiments indicated reproducibility of such arrays. Up- or downregulated genes identified by this rapid screening are now being investigated with techniques such as in situ hybridization.  相似文献   

4.
Comparative genome hybridization (CGH) is a laboratory method to measure gains and losses of chromosomal regions in tumor cells. It is believed that DNA gains and losses in tumor cells do not occur entirely at random, but partly through some flow of causality. Models that relate tumor progression to the occurrence of DNA gains and losses could be very useful in hunting cancer genes and in cancer diagnosis. We lay some mathematical foundations for inferring a model of tumor progression from a CGH data set. We consider a class of tree models that are more general than a path model that has been developed for colorectal cancer. We derive a tree model inference algorithm based on the idea of a maximum-weight branching in a graph, and we show that under plausible assumptions our algorithm infers the correct tree. We have implemented our methods in software, and we illustrate with a CGH data set for renal cancer.  相似文献   

5.
The hiatus observed in the progression of cancer after diagnosis and treatment in a large proportion of patients has led to the notion that a state of cancer dormancy must exist during tumor progression. However, research on this stage of cancer has been limited due to the lack of appropriate models and clinical correlates. Fortunately, the last decade has seen the development of new cancer dormancy models, whole animal and intravital imaging techniques and the molecular characterization of minimal residual disease. These studies enabled researchers to reveal intriguing mechanisms and molecular determinants that define tumor dormancy. It is imperative to understand the basic mechanisms of dormancy, as this will accelerate the development of new markers of progression and novel therapeutic opportunities to induce dormancy and/or eradicate dormant disease. This issue of Cell Cycle includes a “Spotlight on Cancer Dormancy” highlighting major contributions to the field of cancer dormancy from basic and clinical studies. We anticipate that this will initiate a forum of discussion on the problem of cancer dormancy and stimulate investigators to study this rather unexplored but undeniably relevant clinical stage of cancer progression.  相似文献   

6.
Acar E  Plopper GE  Yener B 《PloS one》2012,7(3):e32227
The structure/function relationship is fundamental to our understanding of biological systems at all levels, and drives most, if not all, techniques for detecting, diagnosing, and treating disease. However, at the tissue level of biological complexity we encounter a gap in the structure/function relationship: having accumulated an extraordinary amount of detailed information about biological tissues at the cellular and subcellular level, we cannot assemble it in a way that explains the correspondingly complex biological functions these structures perform. To help close this information gap we define here several quantitative temperospatial features that link tissue structure to its corresponding biological function. Both histological images of human tissue samples and fluorescence images of three-dimensional cultures of human cells are used to compare the accuracy of in vitro culture models with their corresponding human tissues. To the best of our knowledge, there is no prior work on a quantitative comparison of histology and in vitro samples. Features are calculated from graph theoretical representations of tissue structures and the data are analyzed in the form of matrices and higher-order tensors using matrix and tensor factorization methods, with a goal of differentiating between cancerous and healthy states of brain, breast, and bone tissues. We also show that our techniques can differentiate between the structural organization of native tissues and their corresponding in vitro engineered cell culture models.  相似文献   

7.
Cancer has long been understood as a somatic evolutionary process, but many details of tumor progression remain elusive. Here, we present BitPhylogeny, a probabilistic framework to reconstruct intra-tumor evolutionary pathways. Using a full Bayesian approach, we jointly estimate the number and composition of clones in the sample as well as the most likely tree connecting them. We validate our approach in the controlled setting of a simulation study and compare it against several competing methods. In two case studies, we demonstrate how BitPhylogeny reconstructs tumor phylogenies from methylation patterns in colon cancer and from single-cell exomes in myeloproliferative neoplasm.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-015-0592-6) contains supplementary material, which is available to authorized users.  相似文献   

8.
The presence of extra centrioles, termed centrosome amplification, is a hallmark of cancer. The distribution of centriole numbers within a cancer cell population appears to be at an equilibrium maintained by centriole overproduction and selection, reminiscent of mutation-selection balance. It is unknown to date if the interaction between centriole overproduction and selection can quantitatively explain the intra- and inter-population heterogeneity in centriole numbers. Here, we define mutation-selection-like models and employ a model selection approach to infer patterns of centriole overproduction and selection in a diverse panel of human cell lines. Surprisingly, we infer strong and uniform selection against any number of extra centrioles in most cell lines. Finally we assess the accuracy and precision of our inference method and find that it increases non-linearly as a function of the number of sampled cells. We discuss the biological implications of our results and how our methodology can inform future experiments.  相似文献   

9.
In this paper, we are interested in the problem of approximating trees by trees with a particular self-nested structure. Self-nested trees are such that all their subtrees of a given height are isomorphic. We show that these trees present remarkable compression properties, with high compression rates. In order to measure how far a tree is from being a self-nested tree, we then study how to quantify the degree of self-nestedness of any tree. For this, we define a measure of the self-nestedness of a tree by constructing a self-nested tree that minimizes the distance of the original tree to the set of self-nested trees that embed the initial tree. We show that this measure can be computed in polynomial time and depict the corresponding algorithm. The distance to this nearest embedding self-nested tree (NEST) is then used to define compression coefficients that reflect the compressibility of a tree. To illustrate this approach, we then apply these notions to the analysis of plant branching structures. Based on a database of simulated theoretical plants in which different levels of noise have been introduced, we evaluate the method and show that the NESTs of such branching structures restore partly or completely the original, noiseless, branching structures. The whole approach is then applied to the analysis of a real plant (a rice panicle) whose topological structure was completely measured. We show that the NEST of this plant may be interpreted in biological terms and may be used to reveal important aspects of the plant growth.  相似文献   

10.
Tumors contain multiple subpopulations of genetically distinct cancer cells. Reconstructing their evolutionary history can improve our understanding of how cancers develop and respond to treatment. Subclonal reconstruction methods cluster mutations into groups that co-occur within the same subpopulations, estimate the frequency of cells belonging to each subpopulation, and infer the ancestral relationships among the subpopulations by constructing a clone tree. However, often multiple clone trees are consistent with the data and current methods do not efficiently capture this uncertainty; nor can these methods scale to clone trees with a large number of subclonal populations.Here, we formalize the notion of a partially-defined clone tree (partial clone tree for short) that defines a subset of the pairwise ancestral relationships in a clone tree, thereby implicitly representing the set of all clone trees that have these defined pairwise relationships. Also, we introduce a special partial clone tree, the Maximally-Constrained Ancestral Reconstruction (MAR), which summarizes all clone trees fitting the input data equally well. Finally, we extend commonly used clone tree validity conditions to apply to partial clone trees and describe SubMARine, a polynomial-time algorithm producing the subMAR, which approximates the MAR and guarantees that its defined relationships are a subset of those present in the MAR. We also extend SubMARine to work with subclonal copy number aberrations and define equivalence constraints for this purpose. Further, we extend SubMARine to permit noise in the estimates of the subclonal frequencies while retaining its validity conditions and guarantees. In contrast to other clone tree reconstruction methods, SubMARine runs in time and space that scale polynomially in the number of subclones.We show through extensive noise-free simulation, a large lung cancer dataset and a prostate cancer dataset that the subMAR equals the MAR in all cases where only a single clone tree exists and that it is a perfect match to the MAR in most of the other cases. Notably, SubMARine runs in less than 70 seconds on a single thread with less than one Gb of memory on all datasets presented in this paper, including ones with 50 nodes in a clone tree. On the real-world data, SubMARine almost perfectly recovers the previously reported trees and identifies minor errors made in the expert-driven reconstructions of those trees.The freely-available open-source code implementing SubMARine can be downloaded at https://github.com/morrislab/submarine.  相似文献   

11.
Much of the recent philosophical debate on causation and causal explanation in the biological and biomedical sciences has focused on the notion of mechanism. Mechanisms, their nature and epistemic roles have been tackled by a range of so-called neo-mechanistic theories, and widely discussed. Without denying the merits of this approach, our paper aims to show how lately it has failed to give proper credit to processes, which are central to the field, especially of contemporary molecular biology. Processes can be summed up in the notion of ‘pathway’, which is far from being just equivalent to that of ‘mechanism’ and has a profound epistemological and explanatory relevance. It is argued that an adequate consideration of pathways impels some rethinking of scientific explanation in molecular biology, namely its functional and contextual features. A number of examples are given to suggest that the focus of philosophical attention in this disciplinary field should shift from the notion of mechanism to the notion of pathway.  相似文献   

12.
13.
DNA microarray gene expression and microarray-based comparative genomic hybridization (aCGH) have been widely used for biomedical discovery. Because of the large number of genes and the complex nature of biological networks, various analysis methods have been proposed. One such method is "gene shaving," a procedure which identifies subsets of the genes with coherent expression patterns and large variation across samples. Since combining genomic information from multiple sources can improve classification and prediction of diseases, in this paper we proposed a new method, "ICA gene shaving" (ICA, independent component analysis), for jointly analyzing gene expression and copy number data. First we used ICA to analyze joint measurements, gene expression and copy number, of a biological system and project the data onto statistically independent biological processes. Next, we used these results to identify patterns of variation in the data and then applied an iterative shaving method. We investigated the properties of our proposed method by analyzing both simulated and real data. We demonstrated that the robustness of our method to noise using simulated data. Using breast cancer data, we showed that our method is superior to the Generalized Singular Value Decomposition (GSVD) gene shaving method for identifying genes associated with breast cancer.  相似文献   

14.
The identification of genetic and epigenetic alterations from primary tumor cells has become a common method to identify genes critical to the development and progression of cancer. We seek to identify those genetic and epigenetic aberrations that have the most impact on gene function within the tumor. First, we perform a bioinformatic analysis of copy number variation (CNV) and DNA methylation covering the genetic landscape of ovarian cancer tumor cells. We separately examined CNV and DNA methylation for 42 primary serous ovarian cancer samples using MOMA-ROMA assays and 379 tumor samples analyzed by The Cancer Genome Atlas. We have identified 346 genes with significant deletions or amplifications among the tumor samples. Utilizing associated gene expression data we predict 156 genes with altered copy number and correlated changes in expression. Among these genes CCNE1, POP4, UQCRB, PHF20L1 and C19orf2 were identified within both data sets. We were specifically interested in copy number variation as our base genomic property in the prediction of tumor suppressors and oncogenes in the altered ovarian tumor. We therefore identify changes in DNA methylation and expression for all amplified and deleted genes. We statistically define tumor suppressor and oncogenic features for these modalities and perform a correlation analysis with expression. We predicted 611 potential oncogenes and tumor suppressors candidates by integrating these data types. Genes with a strong correlation for methylation dependent expression changes exhibited at varying copy number aberrations include CDCA8, ATAD2, CDKN2A, RAB25, AURKA, BOP1 and EIF2C3. We provide copy number variation and DNA methylation analysis for over 11,500 individual genes covering the genetic landscape of ovarian cancer tumors. We show the extent of genomic and epigenetic alterations for known tumor suppressors and oncogenes and also use these defined features to identify potential ovarian cancer gene candidates.  相似文献   

15.
Many of the steps in phylogenetic reconstruction can be confounded by “rogue” taxa—taxa that cannot be placed with assurance anywhere within the tree, indeed, whose location within the tree varies with almost any choice of algorithm or parameters. Phylogenetic consensus methods, in particular, are known to suffer from this problem. In this paper, we provide a novel framework to define and identify rogue taxa. In this framework, we formulate a bicriterion optimization problem, the relative information criterion, that models the net increase in useful information present in the consensus tree when certain taxa are removed from the input data. We also provide an effective greedy heuristic to identify a subset of rogue taxa and use this heuristic in a series of experiments, with both pathological examples from the literature and a collection of large biological data sets. As the presence of rogue taxa in a set of bootstrap replicates can lead to deceivingly poor support values, we propose a procedure to recompute support values in light of the rogue taxa identified by our algorithm; applying this procedure to our biological data sets caused a large number of edges to move from “unsupported” to “supported” status, indicating that many existing phylogenies should be recomputed and reevaluated to reduce any inaccuracies introduced by rogue taxa. We also discuss the implementation issues encountered while integrating our algorithm into RAxML v7.2.7, particularly those dealing with scaling up the analyses. This integration enables practitioners to benefit from our algorithm in the analysis of very large data sets (up to 2,500 taxa and 10,000 trees, although we present the results of even larger analyses).  相似文献   

16.
Reconstructing a tree of life by inferring evolutionary history is an important focus of evolutionary biology. Phylogenetic reconstructions also provide useful information for a range of scientific disciplines such as botany, zoology, phylogeography, archaeology and biological anthropology. Until the development of protein and DNA sequencing techniques in the 1960s and 1970s, phylogenetic reconstructions were based on fossil records and comparative morphological/physiological analyses. Since then, progress in molecular phylogenetics has compensated for some of the shortcomings of phenotype-based comparisons. Comparisons at the molecular level increase the accuracy of phylogenetic inference because there is no environmental influence on DNA/peptide sequences and evaluation of sequence similarity is not subjective. While the number of morphological/physiological characters that are sufficiently conserved for phylogenetic inference is limited, molecular data provide a large number of datapoints and enable comparisons from diverse taxa. Over the last 20 years, developments in molecular phylogenetics have greatly contributed to our understanding of plant evolutionary relationships. Regions in the plant nuclear and organellar genomes that are optimal for phylogenetic inference have been determined and recent advances in DNA sequencing techniques have enabled comparisons at the whole genome level. Sequences from the nuclear and organellar genomes of thousands of plant species are readily available in public databases, enabling researchers without access to molecular biology tools to investigate phylogenetic relationships by sequence comparisons using the appropriate nucleotide substitution models and tree building algorithms. In the present review, the statistical models and algorithms used to reconstruct phylogenetic trees are introduced and advances in the exploration and utilization of plant genomes for molecular phylogenetic analyses are discussed.  相似文献   

17.
A growing number of inconsistencies have accumulated within the genetically deterministic paradigm of the origin of cancer. Among them the most important are the nonspecific nature of cancer mutations and the non-cell-autonomous factors of cancer initiation and progression. Epigenetic aspects of cancer and cancer systems biology represent novel approaches to cancer aetiology and converge in the notion that cancer is characterized by a nonspecific progressive destabilization of multiple molecular pathways. The coherent behaviour of certain cellular subsystems has been theoretically predicted for a long time to have a general role in coordinating biological processes. However, it has only recently gained major scientific interest when it was measured on photosynthetic complexes at physiological temperatures and confirmed to have a direct effect over the dynamics of the energy transfer. Several theoretical and experimental considerations suggest that cancer might be associated with the absence or impairment of the proper coherent dynamics in certain biological structures, most notably in the microtubules. We review those models and suggest that impaired coherence might largely contribute to the progressive destabilization of the molecular and gene regulatory networks, thus connecting different non-genetic aspects of cancer.  相似文献   

18.
Single-cell RNA and protein concentrations dynamically fluctuate because of stochastic ("noisy") regulation. Consequently, biological signaling and genetic networks not only translate stimuli with functional response but also random fluctuations. Intuitively, this feature manifests as the accumulation of fluctuations from the network source to the target. Taking advantage of the fact that noise propagates directionally, we developed a method for causation prediction that does not require time-lagged observations and therefore can be applied to data generated by destructive assays such as immunohistochemistry. Our method for causation prediction, "Inference of Network Directionality Using Covariance Elements (INDUCE)," exploits the theoretical relationship between a change in the strength of a causal interaction and the associated changes in the single cell measured entries of the covariance matrix of protein concentrations. We validated our method for causation prediction in two experimental systems where causation is well established: in an E. coli synthetic gene network, and in MEK to ERK signaling in mammalian cells. We report the first analysis of covariance elements documenting noise propagation from a kinase to a phosphorylated substrate in an endogenous mammalian signaling network.  相似文献   

19.
It has been claimed that blending processes such as trade and exchange have always been more important in the evolution of cultural similarities and differences among human populations than the branching process of population fissioning. In this paper, we report the results of a novel comparative study designed to shed light on this claim. We fitted the bifurcating tree model that biologists use to represent the relationships of species to 21 biological data sets that have been used to reconstruct the relationships of species and/or higher level taxa and to 21 cultural data sets. We then compared the average fit between the biological data sets and the model with the average fit between the cultural data sets and the model. Given that the biological data sets can be confidently assumed to have been structured by speciation, which is a branching process, our assumption was that, if cultural evolution is dominated by blending processes, the fit between the bifurcating tree model and the cultural data sets should be significantly worse than the fit between the bifurcating tree model and the biological data sets. Conversely, if cultural evolution is dominated by branching processes, the fit between the bifurcating tree model and the cultural data sets should be no worse than the fit between the bifurcating tree model and the biological data sets. We found that the average fit between the cultural data sets and the bifurcating tree model was not significantly different from the fit between the biological data sets and the bifurcating tree model. This indicates that the cultural data sets are not less tree-like than are the biological data sets. As such, our analysis does not support the suggestion that blending processes have always been more important than branching processes in cultural evolution. We conclude from this that, rather than deciding how cultural evolution has proceeded a priori, researchers need to ascertain which model or combination of models is relevant in a particular case and why.  相似文献   

20.
Mathematical techniques have provided tools to quantify the stability of rhythmic movements of humans and machines as well as mathematical models. One archetypal example is the use of Floquet multipliers: assuming periodic motion to be a limit-cycle of a nonlinear oscillator, local stability has been assessed by evaluating the rate of convergence to the limit-cycle. However, the accuracy of the assessment in experiments is questionable: Floquet multipliers provide a measure of orbital stability for deterministic systems, but various components of biological systems and machines involve inevitable noise. In this study, we show that the conventional estimate of orbital stability, which depends on regression, has bias in the presence of noise. We quantify the bias, and devise a new method to estimate orbital stability more accurately. Compared with previous methods, our method substantially reduces the bias, providing acceptable estimates of orbital stability with an order-of-magnitude fewer cycles.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号