首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Phylogenetic reconstruction from DNA or amino acid sequences relies heavily on suitable distance measures. A number of new distance measures (asynchronous, LogDet, and paralinear distances) which possess the desired property of tree additivity under fairly general models of sequence evolution have been proposed recently, but they are not well understood from a mechanistic point of view. We review them here in a unifying framework, which is the substitution process in continuous time. The emerging interpretation will also clarify the relationship among these distance measures. We also tackle situations with site-to-site variation of substitution rates which is well known to cause non-additive distances and inconsistent branch lengths. For homogeneous, stationary, time-reversible models, this may be repaired provided that the distribution of rates is known. In contrast, we will show that, for non-stationary models, different tree topologies may produce identical joint distributions of letters in pairs of sequences, given the same distribution of rates. This precludes the existence of any tree-additive pairwise distance measure.  相似文献   

2.
Wellek S 《Biometrics》2004,60(3):694-703
The classical chi(2)-procedure for the assessment of genetic equilibrium is tailored for establishing lack rather than goodness of fit of an observed genotype distribution to a model satisfying the Hardy-Weinberg law, and the same is true for the exact competitors to the large-sample procedure, which have been proposed in the biostatistical literature since the late 1930s. In this contribution, the methodology of statistical equivalence testing is adopted for the construction of tests for problems in which the assumption of approximate compatibility of the genotype distribution actually sampled with Hardy-Weinberg equilibrium (HWE) plays the role of the alternative hypothesis one aims to establish. The result of such a construction highly depends on the choice of a measure of distance to be used for defining an indifference zone containing those genotype distributions whose degree of disequilibrium shall be considered irrelevant. The first such measure proposed here is the Euclidean distance of the true parameter vector from that of a genotype distribution with identical allele frequencies being in strict HWE. The second measure is based on the (scalar) parameter of the distribution first introduced into the present context by Stevens (1938, Annals of Eugenics 8, 377-383). The first approach leads to a nonconditional test (which nevertheless can be carried out in a numerically exact way), the second to an exact conditional test shown to be uniformly most powerful unbiased (UMPU) for the associated pair of hypotheses. Both tests are compared in terms of the exact power attained against the class of those specific alternatives under which HWE is strictly satisfied.  相似文献   

3.
The coancestry coefficient, also known as the population structure parameter, is of great interest in population genetics. It can be thought of as the intraclass correlation of pairs of alleles within populations and it can serve as a measure of genetic distance between populations. For a general class of evolutionary models it determines the distribution of allele frequencies among populations. Under more restrictive models it can be regarded as the probability of identity by descent of any pair of alleles at a locus within a random mating population. In this paper we review estimation procedures that use the method of moments or are maximum likelihood under the assumption of normally distributed allele frequencies. We then consider the problem of testing hypotheses about this parameter. In addition to parametric and non-parametric bootstrap tests we present an asymptotically-distributed chi-square test. This test reduces to the contingency-table test for equal sample sizes across populations. Our new test appears to be more powerful than previous tests, especially for loci with multiple alleles. We apply our methods to HapMap SNP data to confirm that the coancestry coefficient for humans is strictly positive.  相似文献   

4.
When single-molecule fluorescence localization techniques are pushed to their lower limits in attempts to measure ever-shorter distances, measurement errors become important to understand. Here we describe the non-Gaussian distribution of measured distances that is the key to proper interpretation of distance measurements. We test it on single-molecule high-resolution colocalization data for a known distance, 10 nm, and find that it gives the correct result, whereas interpretation of the same data with a Gaussian distribution gives a result that is systematically too large.  相似文献   

5.
Do local abundances of British birds change with proximity to range edge?   总被引:5,自引:1,他引:4  
Aim Species generally vary in the density they attain at different sites, prompting the question of whether this variation is systematic across their range. We investigate this question using data on the abundance and distribution of thirty-two species of passerine birds across Britain derived from censuses organized by the British Trust for Ornithology. Methods Analysis is complicated by the issue of quantifying the distance of any particular census location from the edge of the range of a species when the study area encompasses only part of its entire distribution. No measure of this quantity is a priori the correct one, and so we use a variety of different measures which make differing assumptions about how abundances might be structured across species ranges. Results None of the measures used reveal any consistent relationships between the density attained by species at census sites and the spatial positions of those sites. Only thirteen species show significant relationships with any of the measures, and no more than seven species with any single measure. Main conclusion In summary, there is no convincing evidence that passerine bird densities are usually lower towards range edges in Britain. We discuss possible reasons for these findings.  相似文献   

6.
The copula of a bivariate distribution, constructed by making marginal transformations of each component, captures all the information in the bivariate distribution about the dependence between two variables. For frailty models for bivariate data the choice of a family of distributions for the random frailty corresponds to the choice of a parametric family for the copula. A class of tests of the hypothesis that the copula is in a given parametric family, with unspecified association parameter, based on bivariate right censored data is proposed. These tests are based on first making marginal Kaplan-Meier transformations of the data and then comparing a non-parametric estimate of the copula to an estimate based on the assumed family of models. A number of options are available for choosing the scale and the distance measure for this comparison. Significance levels of the test are found by a modified bootstrap procedure. The procedure is used to check the appropriateness of a gamma or a positive stable frailty model in a set of survival data on Danish twins.  相似文献   

7.
1. A simple device called a 'pocometer' (POlar COordinate METER) was developed to measure three-dimensional structure of plants. It consists of a tape-measure to measure distance and two protractors to measure zenith angle and azimuth angle.
2. The pocometer can determine locations of points within a few metres distance with a resolution of less than 1cm. Location of any point on a plant can be measured in 10 to 30s depending on the ease of pulling the tape measure to the point of interest.
3. A system to use data obtained with the pocometer to calculate plant light capture was developed. The degree of shading at any point on a plant is estimated by checking obstruction by other plant parts of the view toward the sky at that point.
4. Photon flux density (PFD) on leaf surfaces was estimated for Aucuba japonica , a broad-leaved evergreen shrub, using the above system. The estimated PFDs for individual leaves of a plant corresponded to the sensor-measured PFDs with correlation coefficients of 0·67 to 0·92.  相似文献   

8.
A method for modeling the relationship of polychotomous health ratings with predictors such as area characteristics, the distance to a source of environmental contamination, or exposure to environmental pollutants is presented. The model combines elements of grouped regression and multilevel analysis. The statistical model describes the entire response distribution as a function of the predictors so that any measure that summarizes this distribution can be calculated from the model. With the model, polychotomous health ratings can be used, and there is no need for a priori dichotomizing such variables which would lead to loss of information. It is described how, according to the model, various measures describing the response distribution are related to the exposure, and the confidence and tolerance intervals for these relationships are presented. Specific attention is given to the incorporation of random factors in the model. The application that here serves as an example, concerns annoyance from transportation noise. Exposure-response relationships obtained with the described method of modeling are presented for aircraft, road traffic, and railway noise.  相似文献   

9.
The impacts of sediment contaminants can be evaluated by different lines of evidence, including toxicity tests and ecological community studies. Responses from 10 different toxicity assays/tests were combined to arrive at a “site score.” We employed a relatively simple summary measure, pooled P-values where we quantify a potential decrement in response in a contaminated site relative to nominally clean reference sites. The response-specific P-values were defined relative to a “null” distribution of responses in reference sites, and were then pooled using standard meta-analytic methods. Ecological community data were also evaluated using an analogous strategy. A distribution of distances of the reference sites from thecentroid of the reference sites was obtained. The distance from each of the test sites from the centroid of the reference sites was then calculated, and the proportion of reference distances that exceed the test site difference was used to define an empirical P-value for that test site. A plot of the toxicity P-value versus the community P-value was used to identify sites based on both alteration in community structure and toxicity, that is, by weight-of-evidence. This approach provides a useful strategy for examining multiple lines of evidence that should be accessible to the broader scientific community. The use of a large collection of reference sites to empirically define P-values is appealing in that parametric distribution assumptions are avoided, although this does come at the cost of assuming the reference sites provide an appropriate comparison group for test sites.  相似文献   

10.
Distance based algorithms are a common technique in the construction of phylogenetic trees from taxonomic sequence data. The first step in the implementation of these algorithms is the calculation of a pairwise distance matrix to give a measure of the evolutionary change between any pair of the extant taxa. A standard technique is to use the log det formula to construct pairwise distances from aligned sequence data. We review a distance measure valid for the most general models, and show how the log det formula can be used as an estimator thereof. We then show that the foundation upon which the log det formula is constructed can be generalized to produce a previously unknown estimator which improves the consistency of the distance matrices constructed from the log det formula. This distance estimator provides a consistent technique for constructing quartets from phylogenetic sequence data under the assumption of the most general Markov model of sequence evolution.  相似文献   

11.
The allelic association or linkage disequilibrium between two loci is a parameter of fundamental interest in modern population genetics for evolutionary inference and association mapping studies. Among the many measures available, the optimal measure of allelic association rho presents a strong evolutionary theory basis and is modeled on the physical distance along the chromosome with the Malécot equation for isolation by distance. Moreover, rho is equal to the absolute value of D', the standardized measure of gametic disequilibrium. We studied here the statistical properties of the rho sample estimator. We derived its asymptotic probability distribution and showed that it is neither asymptotically normal nor unbiased when rho=0 or when allelic frequencies are equal at both loci, in contrast to previous claims. This asymptotic study leads to propose a new test for absence of linkage disequilibrium. We compared it to Pearson's Chi2 test for independence in a contingency table and showed by simulations that the range in power of these two tests depends on the sign of D'. The new test outperformed slightly the Chi2 test, when D', polarized with respect to major alleles, is negative. Finally, we derived the asymptotic bias and information of the rho estimator that are due to the experimental sampling and showed by simulation that its bias is large in small samples. The consequences of these findings on applications using the rho measure are then discussed in particular for constructing LD unit maps, and call for a revised statistical treatment.  相似文献   

12.
Comparing two or more phylogenetic trees is a fundamental task in computational biology. The simplest outcome of such a comparison is a pairwise measure of similarity, dissimilarity, or distance. A large number of such measures have been proposed, but so far all suffer from problems varying from computational cost to lack of robustness; many can be shown to behave unexpectedly under certain plausible inputs. For instance, the widely used Robinson-Foulds distance is poorly distributed and thus affords little discrimination, while also lacking robustness in the face of very small changes--reattaching a single leaf elsewhere in a tree of any size can instantly maximize the distance. In this paper, we introduce a new pairwise distance measure, based on matching, for phylogenetic trees. We prove that our measure induces a metric on the space of trees, show how to compute it in low polynomial time, verify through statistical testing that it is robust, and finally note that it does not exhibit unexpected behavior under the same inputs that cause problems with other measures. We also illustrate its usefulness in clustering trees, demonstrating significant improvements in the quality of hierarchical clustering as compared to the same collections of trees clustered using the Robinson-Foulds distance.  相似文献   

13.
The focus of the research is on the analysis of genome sequences. Based on the inter-nucleotide distance sequence, we propose the conditional multinomial distribution profile for the complete genomic sequence. These profiles can be used to define a very simple, computationally efficient, alignment-free, distance measure that reflects the evolutionary relationships between genomic sequences. We use this distance measure to classify chromosomes according to species of origin, to build the phylogenetic tree of 24 complete genome sequences of coronaviruses. Our results demonstrate the new method is powerful and efficient.  相似文献   

14.
A new measure (CL) of spatial/structural landscape complexity is developed in this paper, based on the Levenshtein algorithm used in Computer Science and Bioinformatics for string comparisons. The Levenshtein distance (or edit distance) between two strings of symbols is the minimum of all possible replacements, deletions and insertions necessary to convert one string into the other. In this paper, it is shown how this measure can be applicable on raster landscape maps of any size or shape. Calculations and applications are shown on model and real landscapes. The main advantages of this measure for structural (spatial) landscape analysis are the following: it is easily applicable; it can be compared to its maximum value (depending on the grid resolution); it can be used to compare structural/spatial complexities between landscapes; it is applicable to raster landscape maps of any shape; and it can be used to calculate changes in landscape complexity over time. At the level of ecological practice, it may aid in landscape monitoring, management and planning, by identifying areas of higher structural landscape complexity, which may deserve greater attention in the process of landscape conservation.  相似文献   

15.
Wei E  Wei LJ  Xu X 《Human heredity》2003,55(2-3):143-146
Consider the case that individual phenotype and genotype observations were collected from a large or moderate number of pedigrees. Some of the pedigrees have multi-generation nuclear families. For each nuclear family, the phenotype trait value of each sibling is the time to onset for a specific event (e.g., disease). Often, this event time may be right censored, that is, an individual is event-free at the study examination time point. In this article, we propose a purely nonparametric test for testing if the distribution of a Haseman-Elston distance measure between two siblings' event times is independent of their mean genetic sharing identical by descent at a genetic marker based on such incomplete observations from all the nuclear families. The new test can be implemented easily and is illustrated with a data set from the Genetic Analysis Workshop 12. The validity of the new test is examined via a simulation study.  相似文献   

16.
This paper introduces a novel sampling method for obtaining core collections, entitled genetic distance sampling. The method incorporates information about distances between individual accessions into a random sampling procedure. A basic feature of the method is that automatically larger samples are obtained if accessions are further apart and smaller samples if accessions are closer together. Genetic distance sampling can be used in conjunction with predefined stratifications of the accessions. Sample sizes are determined automatically; they depend on the distances between accessions within strata. The method is applied to the collection of cultivated lettuce of the Centre for Genetic Resources, the Netherlands. In this paper, genetic distances between accessions are obtained using AFLP marker data. However, genetic distance sampling can be applied using any measure of genetic distance between accessions. Some properties of genetic distance sampling are discussed.  相似文献   

17.
We propose an approach for approximating electrostatic charge distributions with a small number of point charges to optimally represent the original charge distribution. By construction, the proposed optimal point charge approximation (OPCA) retains many of the useful properties of point multipole expansion, including the same far-field asymptotic behavior of the approximate potential. A general framework for numerically computing OPCA, for any given number of approximating charges, is described. We then derive a 2-charge practical point charge approximation, PPCA, which approximates the 2-charge OPCA via closed form analytical expressions, and test the PPCA on a set of charge distributions relevant to biomolecular modeling. We measure the accuracy of the new approximations as the RMS error in the electrostatic potential relative to that produced by the original charge distribution, at a distance the extent of the charge distribution–the mid-field. The error for the 2-charge PPCA is found to be on average 23% smaller than that of optimally placed point dipole approximation, and comparable to that of the point quadrupole approximation. The standard deviation in RMS error for the 2-charge PPCA is 53% lower than that of the optimal point dipole approximation, and comparable to that of the point quadrupole approximation. We also calculate the 3-charge OPCA for representing the gas phase quantum mechanical charge distribution of a water molecule. The electrostatic potential calculated by the 3-charge OPCA for water, in the mid-field (2.8 Å from the oxygen atom), is on average 33.3% more accurate than the potential due to the point multipole expansion up to the octupole order. Compared to a 3 point charge approximation in which the charges are placed on the atom centers, the 3-charge OPCA is seven times more accurate, by RMS error. The maximum error at the oxygen-Na distance (2.23 Å ) is half that of the point multipole expansion up to the octupole order.  相似文献   

18.
The widely used FD index of functional diversity is based on the construction of a dendrogram. This index has been the subject of a strong debate concerning the choice of the distance and the clustering method to be used, since the method chosen may greatly affect the FD values obtained. Much of this debate has been centred around which method of dendrogram construction gives a faithful representation of species distribution in multidimensional functional trait space. From artificially generated datasets varying in species richness and correlations between traits, we test whether any single combination of clustering method(s) and distance consistently produces a dendrogram that most closely corresponds to the matrix of functional distances between pairs of species studied. We also test the ability of consensus trees, which incorporate features common to a range of different dendrograms, to summarize distance matrices. Our results show that no combination of clustering method(s) and distance constantly outperforms the others due to the complexity of interactions between correlations of traits, species richness, distance measures and clustering methods. Furthermore, the construction of a consensus tree from a range of dendrograms is often the best solution. Consequently, we recommend testing all combinations of distances and clustering methods (including consensus trees), then selecting the most reliable tree (with the lowest dissimilarity) to estimate FD value. Furthermore we suggest that any index that requires the construction of functional dendrograms potentially benefits from this new approach.  相似文献   

19.
P Laake  K Laake  R Aaberge 《Biometrics》1985,41(2):515-523
The relationship between hospitalization, as a measure of morbidity, and mortality is examined. The difference between age at hospitalization in a general medical department and age at death in Oslo, Norway, is studied. The problem is transferred to the one of examining the difference between two cumulative distribution functions F and G. For this purpose, a quantile distance function based on the inverses of the distribution functions is applied. We give the natural estimate of the quantile distance function, and some asymptotic properties of the corresponding empirical process. For the particular situation where one of the distribution functions is known, a confidence band for the quantile distance function is derived. Applying these results, we show that there are reasons to believe that age at hospitalization and age at death are equally distributed, apart from a constant shift.  相似文献   

20.
The complexity of biological processes such as cell differentiation is reflected in dynamic transitions between cellular states. Trajectory inference arranges the states into a progression using methodologies propelled by single-cell biology. However, current methods, all returning a best trajectory, do not adequately assess statistical significance of noisy patterns, leading to uncertainty in inferred trajectories. We introduce a tree dimension test for trajectory presence in multivariate data by a dimension measure of Euclidean minimum spanning tree, a test statistic, and a null distribution. Computable in linear time to tree size, the tree dimension measure summarizes the extent of branching more effectively than globally insensitive number of leaves or tree diameter indifferent to secondary branches. The test statistic quantifies trajectory presence and its null distribution is estimated under the null hypothesis of no trajectory in data. On simulated and real single-cell datasets, the test outperformed the intuitive number of leaves and tree diameter statistics. Next, we developed a measure for the tissue specificity of the dynamics of a subset, based on the minimum subtree cover of the subset in a minimum spanning tree. We found that tissue specificity of pathway gene expression dynamics is conserved in human and mouse development: several signal transduction pathways including calcium and Wnt signaling are most tissue specific, while genetic information processing pathways such as ribosome and mismatch repair are least so. Neither the tree dimension test nor the subset specificity measure has any user parameter to tune. Our work opens a window to prioritize cellular dynamics and pathways in development and other multivariate dynamical systems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号