首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
ABSTRACT: A phylogenetic network N has vertices corresponding to species and arcs corresponding to direct genetic inheritance from the species at the tail to the species at the head. Measurements of DNA are often made on species in the leaf set, and one seeks to infer properties of the network, possibly including the graph itself. In the case of phylogenetic trees, distances between extant species are frequently used to infer the phylogenetic trees by methods such as neighbor-joining. This paper proposes a "tree-average" distance for networks more general than trees. The notion requires a "weight" on each arc measuring the genetic change along the arc. For each displayed tree the distance between two leaves is the sum of the weights along the path joining them. At a hybrid vertex, each character is inherited from one of its parents. We will assume that for each hybrid there is a probability that the inheritance of a character is from a specified parent. Assume that the inheritance events at different hybrids are independent. Then for each displayed tree there will be a probability that the inheritance of a given character follows the tree; this probability may be interpreted as the probability of the tree. The "tree-average" distance between the leaves is defined to be the expected value of their distance in the displayed trees. For a class of rooted networks that includes rooted trees, it is shown that the weights and the probabilities at each hybrid vertex can be calculated given the network and the tree-average distances between the leaves. Hence these weights and probabilities are uniquely determined. The hypotheses on the networks include that hybrid vertices have indegree exactly 2 and that vertices that are not leaves have a tree-child.  相似文献   

2.
Summary The mixture model is a method of choice for modeling heterogeneous random graphs, because it contains most of the known structures of heterogeneity: hubs, hierarchical structures, or community structure. One of the weaknesses of mixture models on random graphs is that, at the present time, there is no computationally feasible estimation method that is completely satisfying from a theoretical point of view. Moreover, mixture models assume that each vertex pertains to one group, so there is no place for vertices being at intermediate positions. The model proposed in this article is a grade of membership model for heterogeneous random graphs, which assumes that each vertex is a mixture of extremal hypothetical vertices. The connectivity properties of each vertex are deduced from those of the extreme vertices. In this new model, the vector of weights of each vertex are fixed continuous parameters. A model with a vector of parameters for each vertex is tractable because the number of observations is proportional to the square of the number of vertices of the network. The estimation of the parameters is given by the maximum likelihood procedure. The model is used to elucidate some of the processes shaping the heterogeneous structure of a well‐resolved network of host/parasite interactions.  相似文献   

3.
以系列选择性抽提技术与显示细胞骨架的整装电镜技术为基础,应用免疫胶体金标记与蛋白质成份的双向电泳分析技术,研究了BHK_(21)细胞的中间纤维-lamina与核骨架(核基质)结构体系及其主要的蛋白成份。BHK_(21)细胞的中间纤维-lamina与核骨架是在结构上相互联系,贯穿于核与质的网络体系。中间纤维单丝直径为10nm,能很好地被抗波形蛋白抗体-金颗粒所标记,生化分析同样说明BHK_(21)细胞中间纤维的主要成份是波形蛋白(vimentin),其分子量为55KD,等电点为5.6。中间纤维网在胞质内呈极性分布,与lamina密切联结。BHK_(21)细胞的lamina能被抗lamin A与C的单克隆抗体-金颗粒标记。双向电泳分析证明,lamina含有三种蛋白成份,即lamin A,B,C,其分子最分别为68KD,70KD与62KD,lamin A,C等电点均为6.9—7.2,而lamin B偏酸,其等电点为5.8。BHK_(21)细胞核骨架纤维网也可以被清晰的显示,其蛋白成份较为复杂,在双向电泳谱上经常出现多个清晰的斑点,很可能含有肌动蛋白(actin)。298KD核基质蛋白的单克隆抗体-金颗粒能准确的标记核骨架纤维。  相似文献   

4.
Phylogenetic trees based on mtDNA polymorphisms are often used to infer the history of recent human migrations. However, there is no consensus on which method to use. Most methods make strong assumptions which may bias the choice of polymorphisms and result in computational complexity which limits the analysis to a few samples/polymorphisms. For example, parsimony minimizes the number of mutations, which biases the results to minimizing homoplasy events. Such biases may miss the global structure of the polymorphisms altogether, with the risk of identifying a "common" polymorphism as ancient without an internal check on whether it either is homoplasic or is identified as ancient because of sampling bias (from oversampling the population with the polymorphism). A signature of this problem is that different methods applied to the same data or the same method applied to different datasets results in different tree topologies. When the results of such analyses are combined, the consensus trees have a low internal branch consensus. We determine human mtDNA phylogeny from 1737 complete sequences using a new, direct method based on principal component analysis (PCA) and unsupervised consensus ensemble clustering. PCA identifies polymorphisms representing robust variations in the data and consensus ensemble clustering creates stable haplogroup clusters. The tree is obtained from the bifurcating network obtained when the data are split into k = 2,3,4,...,kmax clusters, with equal sampling from each haplogroup. Our method assumes only that the data can be clustered into groups based on mutations, is fast, is stable to sample perturbation, uses all significant polymorphisms in the data, works for arbitrary sample sizes, and avoids sample choice and haplogroup size bias. The internal branches of our tree have a 90% consensus accuracy. In conclusion, our tree recreates the standard phylogeny of the N, M, L0/L1, L2, and L3 clades, confirming the African origin of modern humans and showing that the M and N clades arose in almost coincident migrations. However, the N clade haplogroups split along an East-West geographic divide, with a "European R clade" containing the haplogroups H, V, H/V, J, T, and U and a "Eurasian N subclade" including haplogroups B, R5, F, A, N9, I, W, and X. The haplogroup pairs (N9a, N9b) and (M7a, M7b) within N and M are placed in nonnearest locations in agreement with their expected large TMRCA from studies of their migrations into Japan. For comparison, we also construct consensus maximum likelihood, parsimony, neighbor joining, and UPGMA-based trees using the same polymorphisms and show that these methods give consistent results only for the clade tree. For recent branches, the consensus accuracy for these methods is in the range of 1-20%. From a comparison of our haplogroups to two chimp and one bonobo sequences, and assuming a chimp-human coalescent time of 5 million years before present, we find a human mtDNA TMRCA of 206,000 +/- 14,000 years before present.  相似文献   

5.
Böcker and Dress (Adv Math 138:105–125, 1998) presented a 1-to-1 correspondence between symbolically dated rooted trees and symbolic ultrametrics. We consider the corresponding problem for unrooted trees. More precisely, given a tree T with leaf set X and a proper vertex coloring of its interior vertices, we can map every triple of three different leaves to the color of its median vertex. We characterize all ternary maps that can be obtained in this way in terms of 4- and 5-point conditions, and we show that the corresponding tree and its coloring can be reconstructed from a ternary map that satisfies those conditions. Further, we give an additional condition that characterizes whether the tree is binary, and we describe an algorithm that reconstructs general trees in a bottom-up fashion.  相似文献   

6.
Phylogenetic relationships may be represented by rooted acyclic directed graphs in which each vertex, corresponding to a taxon, possesses a genome. Assume the characters are all binary. A homoplasy occurs if a particular character changes its state more than once in the graph. A vertex is “regular” if it has only one parent and “hybrid” if it has more than one parent. A “regular path” is a directed path such that all vertices after the first are regular. Assume that the network is given and that the genomes are known for all leaves and for the root. Assume that all homoplasies occur only at hybrid vertices and each character has at most one homoplasy. Assume that from each vertex there is a regular path leading to a leaf. In this idealized setting, with other mild assumptions, it is proved that the genome at each vertex is uniquely determined. Hence, for each character the vertex at which a homoplasy occurs in the character is uniquely determined. Without the assumption on regular paths, an example shows that the genomes and homoplasies need not be uniquely determined.  相似文献   

7.
We study distorted metrics on binary trees in the context of phylogenetic reconstruction. Given a binary tree T on n leaves with a path metric d, consider the pairwise distances {d(u,v)} between leaves. It is well known that these determine the tree and the d length of all edges. Here, we consider distortions d of d such that, for all leaves u and v, it holds that |d(u,v)-dmacr(u,v)|1.....T0 such that the true tree T may be obtained from that forest by adding alpha-1 edges and alpha-1les2-Omega(M/g)n. Our distorted metric result implies a reconstruction algorithm of phylogenetic forests with a small number of trees from sequences of length logarithmic in the number of species. The reconstruction algorithm is applicable for the general Markov model. Both the distorted metric result and its applications to phylogeny are almost tight  相似文献   

8.

Reciprocal best matches play an important role in numerous applications in computational biology, in particular as the basis of many widely used tools for orthology assessment. Nevertheless, very little is known about their mathematical structure. Here, we investigate the structure of reciprocal best match graphs (RBMGs). In order to abstract from the details of measuring distances, we define reciprocal best matches here as pairwise most closely related leaves in a gene tree, arguing that conceptually this is the notion that is pragmatically approximated by distance- or similarity-based heuristics. We start by showing that a graph G is an RBMG if and only if its quotient graph w.r.t. a certain thinness relation is an RBMG. Furthermore, it is necessary and sufficient that all connected components of G are RBMGs. The main result of this contribution is a complete characterization of RBMGs with 3 colors/species that can be checked in polynomial time. For 3 colors, there are three distinct classes of trees that are related to the structure of the phylogenetic trees explaining them. We derive an approach to recognize RBMGs with an arbitrary number of colors; it remains open however, whether a polynomial-time for RBMG recognition exists. In addition, we show that RBMGs that at the same time are cographs (co-RBMGs) can be recognized in polynomial time. Co-RBMGs are characterized in terms of hierarchically colored cographs, a particular class of vertex colored cographs that is introduced here. The (least resolved) trees that explain co-RBMGs can be constructed in polynomial time.

  相似文献   

9.
A phylogenetic network is a rooted acyclic digraph with vertices corresponding to taxa. Let X denote a set of vertices containing the root, the leaves, and all vertices of outdegree 1. Regard X as the set of vertices on which measurements such as DNA can be made. A vertex is called normal if it has one parent, and hybrid if it has more than one parent. The network is called normal if it has no redundant arcs and also from every vertex there is a directed path to a member of X such that all vertices after the first are normal. This paper studies properties of normal networks. Under a simple model of inheritance that allows homoplasies only at hybrid vertices, there is essentially unique determination of the genomes at all vertices by the genomes at members of X if and only if the network is normal. This model is a limiting case of more standard models of inheritance when the substitution rate is sufficiently low. Various mathematical properties of normal networks are described. These properties include that the number of vertices grows at most quadratically with the number of leaves and that the number of hybrid vertices grows at most linearly with the number of leaves.  相似文献   

10.
MOTIVATION: Reconstructing evolutionary trees is an important problem in biology. A response to the computational intractability of most of the traditional criteria for inferring evolutionary trees has been a focus on new criteria, particularly quartet-based methods that seek to merge trees derived on subsets of four species from a given species-set into a tree for that entire set. Unfortunately, most of these methods are very sensitive to errors in the reconstruction of the trees for individual quartets of species. A recently developed technique called quartet cleaning can alleviate this difficulty in certain cases by using redundant information in the complete set of quartet topologies for a given species-set to correct such errors. RESULTS: In this paper, we describe two new local vertex quartet cleaning algorithms which have optimal time complexity and error-correction bound, respectively. These are the first known local vertex quartet cleaning algorithms that are optimal with respect to either of these attributes.  相似文献   

11.
From the DNA sequences for N taxa, the (generally unknown) phylogenetic tree T that gave rise to them is to be reconstructed. Various methods give rise, for each quartet J consisting of exactly four taxa, to a predicted tree L(J) based only on the sequences in J, and these are then used to reconstruct T. The author defines an "error-correcting map" (Ec), which replaces each L(J) with a new tree, Ec(L)(J), which has been corrected using other trees, L(K), in the list L. The "quartet distance" between two trees is defined as the number of quartets J on which the two trees differ, and two distinct trees are shown to always have quartet distance of at least N - 3. If L has quartet distance at most (N - 4)/2 from T, then Ec(L) will coincide with the correct list for T; and this result cannot be improved. In general, Ec can correct many more errors in L. Iteration of the map Ec may produce still more accurate lists. Simulations are reported which often show improvement even when the quartet distance considerably exceeds (N - 4)/2. Moreover, the Buneman tree for Ec(L) is shown to refine the Buneman tree for L, so that strongly supported edges for L remain strongly supported for Ec(L). Simulations show that if methods such as the C-tree or hypercleaning are applied to Ec(L), the resulting trees often have more resolution than when the methods are applied only to L.  相似文献   

12.
Citrus sudden death (CSD) is a new disease of sweet orange and mandarin trees grafted on Rangpur lime and Citrus volkameriana rootstocks. It was first seen in Brazil in 1999, and has since been detected in more than four million trees. The CSD causal agent is unknown and the current hypothesis involves a virus similar to Citrus tristeza virus or a new virus named Citrus sudden death-associated virus . CSD symptoms include generalized foliar discoloration, defoliation and root death, and, in most cases, it can cause tree death. One of the unique characteristics of CSD disease is the presence of a yellow stain in the rootstock bark near the bud union. This region also undergoes profound anatomical changes. In this study, we analyse the metabolic disorder caused by CSD in the bark of sweet orange grafted on Rangpur lime by nuclear magnetic resonance (NMR) spectroscopy and imaging. The imaging results show the presence of a large amount of non-functional phloem in the rootstock bark of affected plants. The spectroscopic analysis shows a high content of triacylglyceride and sucrose, which may be related to phloem blockage close to the bud union. We also propose that, without knowing the causal CSD agent, the determination of oil content in rootstock bark by low-resolution NMR can be used as a complementary method for CSD diagnosis, screening about 300 samples per hour.  相似文献   

13.
14.

Background

Phylogenetic networks are generalizations of phylogenetic trees, that are used to model evolutionary events in various contexts. Several different methods and criteria have been introduced for reconstructing phylogenetic trees. Maximum Parsimony is a character-based approach that infers a phylogenetic tree by minimizing the total number of evolutionary steps required to explain a given set of data assigned on the leaves. Exact solutions for optimizing parsimony scores on phylogenetic trees have been introduced in the past.

Results

In this paper, we define the parsimony score on networks as the sum of the substitution costs along all the edges of the network; and show that certain well-known algorithms that calculate the optimum parsimony score on trees, such as Sankoff and Fitch algorithms extend naturally for networks, barring conflicting assignments at the reticulate vertices. We provide heuristics for finding the optimum parsimony scores on networks. Our algorithms can be applied for any cost matrix that may contain unequal substitution costs of transforming between different characters along different edges of the network. We analyzed this for experimental data on 10 leaves or fewer with at most 2 reticulations and found that for almost all networks, the bounds returned by the heuristics matched with the exhaustively determined optimum parsimony scores.

Conclusion

The parsimony score we define here does not directly reflect the cost of the best tree in the network that displays the evolution of the character. However, when searching for the most parsimonious network that describes a collection of characters, it becomes necessary to add additional cost considerations to prefer simpler structures, such as trees over networks. The parsimony score on a network that we describe here takes into account the substitution costs along the additional edges incident on each reticulate vertex, in addition to the substitution costs along the other edges which are common to all the branching patterns introduced by the reticulate vertices. Thus the score contains an in-built cost for the number of reticulate vertices in the network, and would provide a criterion that is comparable among all networks. Although the problem of finding the parsimony score on the network is believed to be computationally hard to solve, heuristics such as the ones described here would be beneficial in our efforts to find a most parsimonious network.  相似文献   

15.
Language as an evolving word web.   总被引:4,自引:0,他引:4  
Human language may be described as a complex network of linked words. In such a treatment, each distinct word in language is a vertex of this web, and interacting words in sentences are connected by edges. The empirical distribution of the number of connections of words in this network is of a peculiar form that includes two pronounced power-law regions. Here we propose a theory of the evolution of language, which treats language as a self-organizing network of interacting words. In the framework of this concept, we completely describe the observed word web structure without any fitting. We show that the two regimes in the distribution naturally emerge from the evolutionary dynamics of the word web. It follows from our theory that the size of the core part of language, the 'kernel lexicon', does not vary as language evolves.  相似文献   

16.
17.
Given a distance matrix M that specifies the pairwise evolutionary distances between n species, the phylogenetic tree reconstruction problem asks for an edge-weighted phylogenetic tree that satisfies M, if one exists. We study some extensions of this problem to rooted phylogenetic networks. Our main result is an O(n(2) log n)-time algorithm for determining whether there is an ultrametric galled network that satisfies M, and if so, constructing one. In fact, if such an ultrametric galled network exists, our algorithm is guaranteed to construct one containing the minimum possible number of nodes with more than one parent (hybrid nodes). We also prove that finding a largest possible submatrix M' of M such that there exists an ultrametric galled network that satisfies M' is NP-hard. Furthermore, we show that given an incomplete distance matrix (i.e. where some matrix entries are missing), it is also NP-hard to determine whether there exists an ultrametric galled network which satisfies it.  相似文献   

18.
Conclusion Most of the previous inconsistencies reported in the early works on CSD from brain, can be readily explained by the presence of two CSD activities in a brain extract in vitro. Their respective nature is now fully elucidated. On the one hand, the so-called CSD II activity is indeed a side activity expressed by GAD in vitro that is unlikely to play a physiologically relevant role in the biosynthesis of taurine in vivo. However it must be recalled that it represents the major contribution to the brain CSD total activity when measured in vitro. On the other hand, the so-called CSD I activity appears to be identical to liver CSD according to all biochemical evidence available to date. It is the most likely enzyme involved in taurine biosynthesis in vivo, and accordingly it represents a putative marker of taurine producing cells in the brain. It must be noticed that in the absence of specific inhibitors direct experimental evidence to support this hypothesis is still lacking. Taking into account all the data gathered in this review the CSD I and CSD II designation that referred only to a chromatography elution order has become obsolete and therefore must be henceforth abandoned. CSD II, as an activity expressed by GAD in vitro, must be called GAD associated CSD activity i.e. GAD/CSD., while CSD I as the brain enzyme identical to liver enzyme must be named CSD solely. According to our present immunocytochemical findings, this latter enzyme was not found in the cerebellum in nerve cells but in glial cells. These results provide the cellular basis for a role for taurine in relation to glial function, possibly as a general purpose regulator manufactured and released by glial cells.Special issue dedicated to Dr. Alan N. Davison.  相似文献   

19.
Vogl C  Xu S 《Genetics》2000,155(3):1439-1447
In line-crossing experiments, deviations from Mendelian segregation ratios are usually observed for some markers. We hypothesize that these deviations are caused by one or more segregation-distorting loci (SDL) linked to the markers. We develop both a maximum-likelihood (ML) method and a Bayesian method to map SDL using molecular markers. The ML mapping is implemented via an EM algorithm and the Bayesian method is performed via the Markov chain Monte Carlo (MCMC). The Bayesian mapping is computationally more intensive than the ML mapping but can handle more complicated models such as multiple SDL and variable number of SDL. Both methods are applied to a set of simulated data and real data from a cross of two Scots pine trees.  相似文献   

20.
Supervised reconstruction of biological networks with local models   总被引:1,自引:0,他引:1  
MOTIVATION: Inference and reconstruction of biological networks from heterogeneous data is currently an active research subject with several important applications in systems biology. The problem has been attacked from many different points of view with varying degrees of success. In particular, predicting new edges with a reasonable false discovery rate is highly demanded for practical applications, but remains extremely challenging due to the sparsity of the networks of interest. RESULTS: While most previous approaches based on the partial knowledge of the network to be inferred build global models to predict new edges over the network, we introduce here a novel method which predicts whether there is an edge from a newly added vertex to each of the vertices of a known network using local models. This involves learning individually a certain subnetwork associated with each vertex of the known network, then using the discovered classification rule associated with only that vertex to predict the edge to the new vertex. Excellent experimental results are shown in the case of metabolic and protein-protein interaction network reconstruction from a variety of genomic data. AVAILABILITY: An implementation of the proposed algorithm is available upon request from the authors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号