首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Böcker and Dress (Adv Math 138:105–125, 1998) presented a 1-to-1 correspondence between symbolically dated rooted trees and symbolic ultrametrics. We consider the corresponding problem for unrooted trees. More precisely, given a tree T with leaf set X and a proper vertex coloring of its interior vertices, we can map every triple of three different leaves to the color of its median vertex. We characterize all ternary maps that can be obtained in this way in terms of 4- and 5-point conditions, and we show that the corresponding tree and its coloring can be reconstructed from a ternary map that satisfies those conditions. Further, we give an additional condition that characterizes whether the tree is binary, and we describe an algorithm that reconstructs general trees in a bottom-up fashion.  相似文献   

2.
Bayesian inference of phylogeny is unique among phylogenetic reconstruction methods in that it produces a posterior distribution of trees rather than a point estimate of the best tree. The most common way to summarize this distribution is to report the majority-rule consensus tree annotated with the marginal posterior probabilities of each partition. Reporting a single tree discards information contained in the full underlying distribution and reduces the Bayesian analysis to simply another method for finding a point estimate of the tree. Even when a point estimate of the phylogeny is desired, the majority-rule consensus tree is only one possible method, and there may be others that are more appropriate for the given data set and application. We present a method for summarizing the distribution of trees that is based on identifying agreement subtrees that are frequently present in the posterior distribution. This method provides fully resolved binary trees for subsets of taxa with high marginal posterior probability on the entire tree and includes additional information about the spread of the distribution.  相似文献   

3.
ABSTRACT: A phylogenetic network N has vertices corresponding to species and arcs corresponding to direct genetic inheritance from the species at the tail to the species at the head. Measurements of DNA are often made on species in the leaf set, and one seeks to infer properties of the network, possibly including the graph itself. In the case of phylogenetic trees, distances between extant species are frequently used to infer the phylogenetic trees by methods such as neighbor-joining. This paper proposes a "tree-average" distance for networks more general than trees. The notion requires a "weight" on each arc measuring the genetic change along the arc. For each displayed tree the distance between two leaves is the sum of the weights along the path joining them. At a hybrid vertex, each character is inherited from one of its parents. We will assume that for each hybrid there is a probability that the inheritance of a character is from a specified parent. Assume that the inheritance events at different hybrids are independent. Then for each displayed tree there will be a probability that the inheritance of a given character follows the tree; this probability may be interpreted as the probability of the tree. The "tree-average" distance between the leaves is defined to be the expected value of their distance in the displayed trees. For a class of rooted networks that includes rooted trees, it is shown that the weights and the probabilities at each hybrid vertex can be calculated given the network and the tree-average distances between the leaves. Hence these weights and probabilities are uniquely determined. The hypotheses on the networks include that hybrid vertices have indegree exactly 2 and that vertices that are not leaves have a tree-child.  相似文献   

4.
Ekholm A  McDonald JW  Smith PW 《Biometrics》2000,56(3):712-718
Models for a multivariate binary response are parameterized by univariate marginal probabilities and dependence ratios of all orders. The w-order dependence ratio is the joint success probability of w binary responses divided by the joint success probability assuming independence. This parameterization supports likelihood-based inference for both regression parameters, relating marginal probabilities to explanatory variables, and association model parameters, relating dependence ratios to simple and meaningful mechanisms. Five types of association models are proposed, where responses are (1) independent given a necessary factor for the possibility of a success, (2) independent given a latent binary factor, (3) independent given a latent beta distributed variable, (4) follow a Markov chain, and (5) follow one of two first-order Markov chains depending on the realization of a binary latent factor. These models are illustrated by reanalyzing three data sets, foremost a set of binary time series on auranofin therapy against arthritis. Likelihood-based approaches are contrasted with approaches based on generalized estimating equations. Association models specified by dependence ratios are contrasted with other models for a multivariate binary response that are specified by odds ratios or correlation coefficients.  相似文献   

5.
In this article, a general procedure is presented for testing for equality of k independent binary response probabilities against any given ordered alternative. The proposed methodology is based on an estimation procedure developed in Hwang and Peddada (1994, Annals of Statistics 22, 67-93) and can be used for a very broad class of order restrictions. The procedure is illustrated through application to two data sets that correspond to three commonly encountered order restrictions: simple tree order, simple order, and down turn order.  相似文献   

6.
Neutral macroevolutionary models, such as the Yule model, give rise to a probability distribution on the set of discrete rooted binary trees over a given leaf set. Such models can provide a signal as to the approximate location of the root when only the unrooted phylogenetic tree is known, and this signal becomes relatively more significant as the number of leaves grows. In this short note, we show that among models that treat all taxa equally, and are sampling consistent (i.e. the distribution on trees is not affected by taxa yet to be included), all such models, except one (the so-called PDA model), convey some information as to the location of the ancestral root in an unrooted tree.  相似文献   

7.
This article considers global tests of differences between paired vectors of binomial probabilities, based on data from two dependent multivariate binary samples. Difference is defined as either an inhomogeneity in the marginal distributions or asymmetry in the joint distribution. For detecting the first type of difference, we propose a multivariate extension of McNemar's test and show that it is a generalized score test under a generalized estimating equations (GEE) approach. Univariate features such as the relationship between the Wald and score tests and the dropout of pairs with the same response carry over to the multivariate case and the test does not depend on the working correlation assumption among the components of the multivariate response. For sparse or imbalanced data, such as occurs when the number of variables is large or the proportions are close to zero, the test is best implemented using a bootstrap, and if this is computationally too complex, a permutation distribution. We apply the test to safety data for a drug, in which two doses are evaluated by comparing multiple responses by the same subjects to each one of them.  相似文献   

8.
In the flora of French Guiana we find considerable within-plant variation in leaf form. We observed entire, two-lobed, and three-lobed leaves within five separate levels (tiers) of the canopy of a single individual ofPourouma tomentosa subsp.maroniensis. Five branches from each of the five tiers of the tree were collected around the axis of the trunk. From these branches five secondary branchlets were selected and all leaves excised with information recorded as to nodal position, number of leaf nodes, and fertility status of the main branch. This design produced 1015 leaves representing about 20 m2 of foliar area and about 2.4 kg of blade dry weight. Our objectives were to determine if statistically significant patterns exist for leaf variation and to suggest improvements for future, general collections. The four lower tiers had 62% entire, 10% 2-lobed, and 28% 3-lobed leaves, in contrast to the top tier with 38% entire, 11% 2-lobed, and 51% 3-lobed leaves. The top tier had no fertile branches. in the lower tiers, fertile branches produced 68% entire leaves whereas nonfertile branches produced only 46% entire leaves. In the top tier, lobed leaves made up 73% of surface area, while in the lower tiers, lobed leaves made up only 48% of total surface area. We selected a random subset of 75 leaves from the 1015, for morphometric analysis using two-way ANOVA (tier×leaf type). The boundaries of leaf images were digitized and rendered into Fourier coefficients, yielding leaf surface area and two variables that quantify aspects of shape: dissection index and leaf complexity. The Fourier coefficients were averaged by tier and by leaf type to reconstruct synthetic, average leaf images. Logistic regression was used to predict the position of leaves on the tree and to provide visualization of the relationships between leaf position on the tree and leaf morphological variables. Within the tree crown, leaf surface area and leaf specific mass (LSM) increases with height, although leaf shape does not change with height. LSM does not vary with leaf form; and sun leaves are larger than shade leaves on this tree. We conducted computer sampling experiments based on exact randomization to simulate the process of obtaining all leaf shapes present in an individual tree when making field collections of varying numbers of duplicates. This also points out the importance of noting the presence of within-tree variation in leaf form on herbarium labeds. Failure to recognize leaf variation can lead to incorrect delimitation of species as well as cause overestimates of the number of species in diversity studies.  相似文献   

9.
The Robinson-Foulds (RF) distance is by far the most widely used measure of dissimilarity between trees. Although the distribution of these distances has been investigated for 20 years, an algorithm that is explicitly polynomial time has yet to be described for computing the distribution for trees around a given tree. In this paper, we derive a polynomial-time algorithm for this distribution. We show how the distribution can be approximated by a Poisson distribution determined by the proportion of leaves that lie in “cherries” of the given tree. We also describe how our results can be used to derive normalization constants that are required in a recently proposed maximum likelihood approach to supertree construction.  相似文献   

10.

Background

Eelgrass is a cosmopolitan seagrass species that provides important ecological services in coastal and near-shore environments. Despite its relevance, loss of eelgrass habitats is noted worldwide. Restoration by replanting plays an important role, and accurate measurements of the standing crop and productivity of transplants are important for evaluating restoration of the ecological functions of natural populations. Traditional assessments are destructive, and although they do not harm natural populations, in transplants the destruction of shoots might cause undesirable alterations. Non-destructive assessments of the aforementioned variables are obtained through allometric proxies expressed in terms of measurements of the lengths or areas of leaves. Digital imagery could produce measurements of leaf attributes without the removal of shoots, but sediment attachments, damage infringed by drag forces or humidity contents induce noise-effects, reducing precision. Available techniques for dealing with noise caused by humidity contents on leaves use the concepts of adjacency, vicinity, connectivity and tolerance of similarity between pixels. Selection of an interval of tolerance of similarity for efficient measurements requires extended computational routines with tied statistical inferences making concomitant tasks complicated and time consuming. The present approach proposes a simplified and cost-effective alternative, and also a general tool aimed to deal with any sort of noise modifying eelgrass leaves images. Moreover, this selection criterion relies only on a single statistics; the calculation of the maximum value of the Concordance Correlation Coefficient for reproducibility of observed areas of leaves through proxies obtained from digital images.

Results

Available data reveals that the present method delivers simplified, consistent estimations of areas of eelgrass leaves taken from noisy digital images. Moreover, the proposed procedure is robust because both the optimal interval of tolerance of similarity and the reproducibility of observed leaf areas through digital image surrogates were independent of sample size.

Conclusion

The present method provides simplified, unbiased and non-destructive measurements of eelgrass leaf area. These measurements, in conjunction with allometric methods, can predict the dynamics of eelgrass biomass and leaf growth through indirect techniques, reducing the destructive effect of sampling, fundamental to the evaluation of eelgrass restoration projects thereby contributing to the conservation of this important seagrass species.
  相似文献   

11.
Estimates of leaf size and asymmetry for individual trees are often obtained using sample sizes that are too small to take into account the possibility that size and asymmetry may be affected by the position of the leaf on the tree. This issue was addressed by exploring variation in leaf size and asymmetry within an individual of Alder (Alnus glutinosa). We found differences between branches for leaf size and for signed asymmetry but not for unsigned asymmetry. We also found that the size of a leaf was not correlated with its position on a branch and that the asymmetry of a leaf was not correlated with either its position on a branch or with the asymmetry of its neighbour. Repeated subsampling of a sample of 870 leaves showed that a subsample size approaching 500 leaves was required for consistently reliable estimates of the standard deviation of unsigned asymmetry. Smaller subsamples were required for consistently reliable estimates of mean unsigned asymmetry and of the mean and standard deviation of leaf size, but subsamples of less than 100 leaves provided consistently reliable estimates only of mean leaf size. For this species, reliable estimates of an individual's level of asymmetry are obtained only if several hundred leaves are sampled over several branches, but it is not necessary to sample the same sequence of leaves from each branch.  相似文献   

12.
For assessment of genetic association between single-nucleotide polymorphisms (SNPs) and disease status, the logistic-regression model or generalized linear model is typically employed. However, testing for deviation from Hardy-Weinberg proportion in a patient group could be another approach for genetic-association studies. The Hardy-Weinberg proportion is one of the most important principles in population genetics. Deviation from Hardy-Weinberg proportion among cases (patients) could provide additional evidence for the association between SNPs and diseases. To develop a more powerful statistical test for genetic-association studies, we combined evidence about deviation from Hardy-Weinberg proportion in case subjects and standard regression approaches that use case and control subjects. In this paper, we propose two approaches for combining such information: the mean-based tail-strength measure and the median-based tail-strength measure. These measures integrate logistic regression and Hardy-Weinberg-proportion tests for the study of the association between a binary disease outcome and an SNP on the basis of case- and control-subject data. For both mean-based and median-based tail-strength measures, we derived exact formulas to compute p values. We also developed an approach for obtaining empirical p values with the use of a resampling procedure. Results from simulation studies and real-disease studies demonstrate that the proposed approach is more powerful than the traditional logistic-regression model. The type I error probabilities of our approach were also well controlled.  相似文献   

13.
The method of invariants is an approach to the problem of reconstructing the phylogenetic tree of a collection of m taxa using nucleotide sequence data. Models for the respective probabilities of the 4m possible vectors of bases at a given site will have unknown parameters that describe the random mechanism by which substitution occurs along the branches of a putative phylogenetic tree. An invariant is a polynomial in these probabilities that, for a given phylogeny, is zero for all choices of the substitution mechanism parameters. If the invariant is typically non-zero for another phylogenetic tree, then estimates of the invariant can be used as evidence to support one phylogeny over another. Previous work of Evans and Speed showed that, for certain commonly used substitution models, the problem of finding a minimal generating set for the ideal of invariants can be reduced to the linear algebra problem of finding a basis for a certain lattice (that is, a free Z-module). They also conjectured that the cardinality of such a generating set can be computed using a simple "degrees of freedom" formula. We verify this conjecture. Along the way, we explain in detail how the observations of Evans and Speed lead to a simple, computationally feasible algorithm for constructing a minimal generating set.  相似文献   

14.
We present a method to infer a straight-lines tree branch system from a given set of leaf positions and average branching angles. Among an extensive set of possible branch systems constructed in the process, we choose the one featuring the shortest total length, following an optimality hypothesis by Leopold (1971). The approach is illustrated using empirical low-order skeletons from European beech. Our method further allows to assess, for a given species or individual tree, to what extent its branching pattern accords to Leopold's hypothesis, which we argue to be the case for beech. While yet facing issues of computational intensity for too many leaves, the method can furthermore be used to complement existing tree structure reconstruction methods that otherwise require a rudimentary skeleton as manual input.  相似文献   

15.
Fujisawa H  Izumi S 《Biometrics》2000,56(3):706-711
Repeated binary responses provide efficient information for two purposes: (1) estimating two misclassification (false-positive and false-negative error) probabilities and (2) testing the hypothesis that either is zero in a reliability study. We focus on the assessment of reliability of a diagnostic test when there is no gold standard. This paper uses a latent class model and illustrates some of its properties. In addition, application to data containing variation among individuals is considered. We apply this model to the serological data on the MNSs blood group of atomic bomb survivors and their children. The results provide valuable information for examining measurement reliability.  相似文献   

16.
  • 1 Invasive species pose significant threats to native and managed ecosystems. However, it may not always be possible to perform rigorous, long‐term studies on invaders to determine the factors that influence their population dynamics, particularly when time and resources are limited. We applied a novel approach to determine factors associated with mortality in larvae of the sawfly Profenusa thomsoni Konow, a leafminer of birch, and a relatively recent invader of urban and rural birch forests in Alaska. Classification tree analysis was applied to reveal relationships between qualitative and quantitative predictor variables and categorical response variables in a large data set of larval mortality observations.
  • 2 We determined the state (living or dead) of sawfly larvae in samples of individual leaves. Each leaf was scored for variables reflecting the intensity of intra‐specific competition and leaf quality for leafminers, year of collection and degree‐days accumulated were recorded for each sample. We explored the association of these variables with larval state using classification tree analysis.
  • 3 Leafminer mortality was best explained by a combination of competition and resource exhaustion and our analysis revealed a possible advantage to group feeding in young larvae that may explain previously observed patterns of resource overexploitation in this species. Dead larvae were disproportionately found in smaller leaves, which highlights the potential effect of competition on mortality and suggests that smaller‐leaved species of birch will better able to resist leafminer damage.
  • 4 We show that classification tree analysis may be useful in situations where urgency and/or limited resources prohibit traditional life‐table studies.
  相似文献   

17.
Gene expression in autumn leaves   总被引:36,自引:0,他引:36  
Two cDNA libraries were prepared, one from leaves of a field-grown aspen (Populus tremula) tree, harvested just before any visible sign of leaf senescence in the autumn, and one from young but fully expanded leaves of greenhouse-grown aspen (Populus tremula x tremuloides). Expressed sequence tags (ESTs; 5,128 and 4,841, respectively) were obtained from the two libraries. A semiautomatic method of annotation and functional classification of the ESTs, according to a modified Munich Institute of Protein Sequences classification scheme, was developed, utilizing information from three different databases. The patterns of gene expression in the two libraries were strikingly different. In the autumn leaf library, ESTs encoding metallothionein, early light-inducible proteins, and cysteine proteases were most abundant. Clones encoding other proteases and proteins involved in respiration and breakdown of lipids and pigments, as well as stress-related genes, were also well represented. We identified homologs to many known senescence-associated genes, as well as seven different genes encoding cysteine proteases, two encoding aspartic proteases, five encoding metallothioneins, and 35 additional genes that were up-regulated in autumn leaves. We also indirectly estimated the rate of plastid protein synthesis in the autumn leaves to be less that 10% of that in young leaves.  相似文献   

18.
Graphs obtained from a binary leaf labeled ("phylogenetic") tree by adding an edge so as to introduce a cycle provide a useful representation of hybrid evolution in molecular evolutionary biology. This class of graphs (which we call "unicyclic networks") also has some attractive combinatorial properties, which we present. We characterize when a set of binary phylogenetic trees is displayed by a unicyclic network in terms of tree rearrangement operations. This leads to a triple-wise compatibility theorem and a simple, fast algorithm to determine 1-cycle compatibility. We also use generating function techniques to provide closed-form expressions that enumerate unicyclic networks with specified or unspecified cycle length, and we provide an extension to enumerate a class of multicyclic networks.  相似文献   

19.
Modeling the joint distribution of a binary trait (disease) within families is a tedious challenge, owing to the lack of a general statistical model with desirable properties such as the multivariate Gaussian model for a quantitative trait. Models have been proposed that either assume the existence of an underlying liability variable, the reality of which cannot be checked, or provide estimates of aggregation parameters that are dependent on the ordering of family members and on family size. We describe how a class of copula models for the analysis of exchangeable categorical data can be incorporated into a familial framework. In this class of models, the joint distribution of binary outcomes is characterized by a function of the given marginals. This function, referred to as a "copula," depends on an aggregation parameter that is weakly dependent on the marginal distributions. We propose to decompose a nuclear family into two sets of equicorrelated data (parents and offspring), each of which is characterized by an aggregation parameter (alphaFM and alphaSS, respectively). The marginal probabilities are modeled through a logistic representation. The advantage of this model is that it provides estimates of the aggregation parameters that are independent of family size and does not require any arbitrary ordering of sibs. It can be incorporated easily into segregation or combined segregation-linkage analysis and does not require extensive computer time. As an illustration, we applied this model to a combined segregation-linkage analysis of levels of plasma angiotensin I-converting enzyme (ACE) dichotomized into two classes according to the median. The conclusions of this analysis were very similar to those we had reported in an earlier familial analysis of quantitative ACE levels.  相似文献   

20.
BACKGROUND: We describe Support Vector Machine (SVM) applications to classification and clustering of channel current data. SVMs are variational-calculus based methods that are constrained to have structural risk minimization (SRM), i.e., they provide noise tolerant solutions for pattern recognition. The SVM approach encapsulates a significant amount of model-fitting information in the choice of its kernel. In work thus far, novel, information-theoretic, kernels have been successfully employed for notably better performance over standard kernels. Currently there are two approaches for implementing multiclass SVMs. One is called external multi-class that arranges several binary classifiers as a decision tree such that they perform a single-class decision making function, with each leaf corresponding to a unique class. The second approach, namely internal-multiclass, involves solving a single optimization problem corresponding to the entire data set (with multiple hyperplanes). RESULTS: Each SVM approach encapsulates a significant amount of model-fitting information in its choice of kernel. In work thus far, novel, information-theoretic, kernels were successfully employed for notably better performance over standard kernels. Two SVM approaches to multiclass discrimination are described: (1) internal multiclass (with a single optimization), and (2) external multiclass (using an optimized decision tree). We describe benefits of the internal-SVM approach, along with further refinements to the internal-multiclass SVM algorithms that offer significant improvement in training time without sacrificing accuracy. In situations where the data isn't clearly separable, making for poor discrimination, signal clustering is used to provide robust and useful information--to this end, novel, SVM-based clustering methods are also described. As with the classification, there are Internal and External SVM Clustering algorithms, both of which are briefly described.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号