首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Studying the discontinuity patterns of Paleozoic vascular plants provides a global vision of these key events from the multivariate methods viewpoint. Non-metric multidimensional scaling, detrended correspondence analysis and cluster analysis have been employed together with a set of diversity and abundance measures and an evaluation of the geologic constraints from the plant fossil record data. The results reveal four clear significant discontinuities in terms of taxonomic composition and record representativeness during the early-middle Devonian, Devonian–Carboniferous, Mississippian–Pennsylvanian and early-late Permian. Due to the controversial character of the plant fossil record data and the effect of mass extinction events, the results can be explained in taxonomic turnover and ecological reorganisation terms which emphasise the crucial role of the geologic constrains in paleobiological inference.  相似文献   

2.
3.
非度量多维测度及其在群落分类中的应用   总被引:15,自引:1,他引:14       下载免费PDF全文
排序与聚类分析法是近代最为常用的植被数量分析方法。原理上,排序较之于聚类法,一般具有较为严格的数学基础。在植被研究的应用方面,排序比聚类法能更好反映植被的连续性。同时,在反映植被分类结果上,排序图不仅象聚类树形图一样,使全部实体之间的划分关系得以反映,而且两两实体间的关系可以通过排序图上彼此间的距离比较而得以较好的反映。然而常规的排序方法主要适用于具有线性结构的数据分析,在2—3维排序图上常难以充分反映这些实体的生态学关系,造成大量生态学数据信息的损失。非度量多维测度(Non-metric multidimensional scaling)是近期发展起来的适用于非线性数据结构分析的一种复杂的迭代排序方法。它的基本思想是通过排序,n个实体在尽可能低维(t相似文献   

4.
Cluster analysis has proven to be a valuable statistical method for analyzing whole genome expression data. Although clustering methods have great utility, they do represent a lower level statistical analysis that is not directly tied to a specific model. To extend such methods and to allow for more sophisticated lines of inference, we use cluster analysis in conjunction with a specific model of gene expression dynamics. This model provides phenomenological dynamic parameters on both linear and non-linear responses of the system. This analysis determines the parameters of two different transition matrices (linear and nonlinear) that describe the influence of one gene expression level on another. Using yeast cell cycle microarray data as test set, we calculated the transition matrices and used these dynamic parameters as a metric for cluster analysis. Hierarchical cluster analysis of this transition matrix reveals how a set of genes influence the expression of other genes activated during different cell cycle phases. Most strikingly, genes in different stages of cell cycle preferentially activate or inactivate genes in other stages of cell cycle, and this relationship can be readily visualized in a two-way clustering image. The observation is prior to any knowledge of the chronological characteristics of the cell cycle process. This method shows the utility of using model parameters as a metric in cluster analysis.  相似文献   

5.
Nine healthy females were studied about the time of the spring equinox while living in student accommodations and aware of the passage of solar time. After 7 control days, during which a conventional lifestyle was lived under a 24h “constant routine,” the subjects lived 17 × 27h “days” (9h sleep in the dark and 18h wake using domestic lighting, if required). Throughout the experiment, recordings of wrist activity and rectal (core) temperature were taken. The raw temperature data were assessed for phase and amplitude by cosinor analysis and another method, “crossover times,” which does not assume that the data set is sinusoidal. Two different purification methods were used in attempts to remove the masking effects of sleep and activity from the core temperature record and so to measure more closely the endogenous component of this rhythm; these two methods were “purification by categories” and “purification by intercepts.” The former method assumes that the endogenous component is a sinusoid, and that the masking effects can be estimated by putting activity into a number of bands or categories. The latter method assumes that a temperature that would correspond to complete inactivity can be estimated from measured temperatures by linear regression of these on activity and extrapolation to a temperature at zero activity. Three indices were calculated to assess the extent to which exogenous effects had been removed from the temperature data by these purification methods. These indices were the daily variation of phase about its median value; the ratio of this variation to the daily deviation of phase about midactivity; and the relationship between amplitude and the square of the deviation of phase from midactivity. In all cases, the index would decrease in size as the contribution of the exogenous component to a data set fell. The purification by categories approach was successful in proportion to the number of activity categories that was used, and as few as four categories produced a data set with significantly less masking than raw data. The method purification by intercepts was less successful unless the raw data had been “corrected” to reflect the direct effects of sleep that were independent of activity (a method to achieve this being produced). Use of this purification method with the corrected data then gave results that showed least exogenous influences. Both this method and the purification by categories method with 16 categories of activity gave evidence that the exogenous component no longer made a significant contribution to the purified data set. The results were not significantly influenced by assessing amplitude and phase of the circadian rhythm from crossover times rather than cosinor analysis. The relative merits of the different methods, as well as of other published methods, are compared briefly; it is concluded that several purification methods, of differing degrees of sophistication and ease of application to raw data, are of value in field studies and other circumstances in which constant routines are not possible or are ethically undesirable. It is also concluded that such methods are often somewhat limited insofar as they are based on pragmatic or biological, rather than mathematical, considerations, and so it is desirable to attempt to develop models based equally on mathematics and biology. (Chronobiology International, 17(4), 539-566, 2000)  相似文献   

6.
7.
最大密度法则研究进展   总被引:2,自引:0,他引:2       下载免费PDF全文
 该文从理论推导和研究方法等方面对近几十年来关于最大密度法则的研究进展进行了综述,得出结论:1)关于最大密度法则理论主要有几何关系的3/2法则和空间填充分行支状网运输结构的WBE模型。进一步研究发现它们都是建立在一种静态的统计分析基础之上的,因而近几年研究者们开始尝试用动态的个体植物之间的竞争来建立模型。尽管如此,关于最大密度法则的模型仍然没有逃出固有的模式,如用平均植物大小代替整个植物种群。因此,关于最大密度法则理论需要进一步的研究。2)最大密度法则理论在假设条件、数学推导、 用于估计参数的原始数据选择等方面存在争议。任何模型的建立都是基于一些特定的条件和假设建立的,因而得到的关系并不是一个万能的定律。所以在分析数据时,这些模型可结合使用。3)在研究方法上,由于大家对最大密度法则的理解不同,标准不同,造成研究方法多种多样。因而建议在以后的研究中建立一个客观统一的方法。  相似文献   

8.
We describe two new methods to partition phylogenetic data sets of discrete characters based on pairwise compatibility. The partitioning methods make no assumptions regarding the phylogeny, model of evolution, or characteristics of the data. The methods first build a compatibility graph, in which each node represents a character in the data set. Edges in the compatibility graph may represent strict compatibility of characters or they may be weighted based on a fractional compatibility scoring procedure that measures how close the characters are to being compatible. Given the desired number of partitions, the partitioning methods then seek to cluster the characters with the highest average pairwise compatibility, so that characters in each cluster are more compatible with each other than they are with characters in the other cluster(s). Partitioning according to these criteria is computationally intractable (NP-hard); however, spectral methods can quickly provide high-quality solutions. We demonstrate that the spectral partitioning effectively identifies characters with different evolutionary histories in simulated data sets, and it is better at highlighting phylogenetic conflict within empirical data sets than previously used partitioning methods.  相似文献   

9.
The age of the angiosperms: a molecular timescale without a clock   总被引:8,自引:0,他引:8  
The age of the angiosperms has long been of interest to botanists and evolutionary biologists. Many early efforts to date the age of the angiosperms and evolutionary divergences within the angiosperm clade using a molecular clock have yielded age estimates that are grossly inconsistent with the fossil record. We investigated the age of angiosperms using Bayesian relaxed clock (BRC) and penalized likelihood (PL) approaches. Both of these methods allow the incorporation of multiple fossil constraints into the optimization procedure. The BRC method allows a range of values for among-lineage rate of substitution, from a nearly clocklike behavior to a condition in which each branch is allowed an optimal substitution rate, and also accounts for variation in molecular evolution across multiple genes. A topology derived from an analysis of genes from all three plant genomes for 71 taxa was used as a backbone. The effects on age estimates of different genes, single-gene versus concatenated datasets, and the inclusion and assumptions of fossils as age constraints were examined. In addition, the influence of prior distributions on estimates of divergence times was also explored. These results indicate that widely divergent age estimates can result from the different methods (198-139 million years ago), different sources of data (275-122 million years ago), and the inclusion of temporal constraints to topologies. Most dates, however, are between 180-140 million years ago, suggesting a Middle Jurassic-Early Cretaceous origin of flowering plants, predating the oldest unequivocal fossil angiosperms by about 45-5 million years. Nonetheless, these dates are consistent with other recent studies that have used methods that relax the assumption of a strict molecular clock and also agree with the hypothesis that the angiosperms may be somewhat older than the fossil record indicates.  相似文献   

10.
Lot quality assurance sampling (LQAS) surveys are commonly used for monitoring and evaluation in resource-limited settings. Recently several methods have been proposed to combine LQAS with cluster sampling for more timely and cost-effective data collection. For some of these methods, the standard binomial model can be used for constructing decision rules as the clustering can be ignored. For other designs, considered here, clustering is accommodated in the design phase. In this paper, we compare these latter cluster LQAS methodologies and provide recommendations for choosing a cluster LQAS design. We compare technical differences in the three methods and determine situations in which the choice of method results in a substantively different design. We consider two different aspects of the methods: the distributional assumptions and the clustering parameterization. Further, we provide software tools for implementing each method and clarify misconceptions about these designs in the literature. We illustrate the differences in these methods using vaccination and nutrition cluster LQAS surveys as example designs. The cluster methods are not sensitive to the distributional assumptions but can result in substantially different designs (sample sizes) depending on the clustering parameterization. However, none of the clustering parameterizations used in the existing methods appears to be consistent with the observed data, and, consequently, choice between the cluster LQAS methods is not straightforward. Further research should attempt to characterize clustering patterns in specific applications and provide suggestions for best-practice cluster LQAS designs on a setting-specific basis.  相似文献   

11.
Data on fossil taxa can, and should, be incorporated into cladistic analyses. Potential problems with such analyses include large amounts of missing data, and uncertainty about homology of parts that are present. Ambiguity of character data may also occur with extant taxa, but rarely to the extent that it occurs in fossil data. Such ambiguity reduces the strength of the test of character congruence among taxa, in effect relaxing the criterion of parsimony. In order to minimize such effects, composite fossil taxa should be avoided when possible, and polymorphisms reduced by breaking terminals into monomorphic subunits. When results including fossils differ radically from those that exclude fossils, such differences should be approached with caution, keeping in mind the reduced strength of the parsimony analysis when large numbers of cells in a matrix are scored as ambiguous. At this point, there is no simple way to compare the “strength” of parsimony between two data sets that have different numbers of characters and/or taxa in relation to missing data. However, methods under development may provide ways to incorporate the effect of missing values into relative measures of group support such as Bremer support, character removal, and the bootstrap.  相似文献   

12.
This paper proposes a statistical test of the single-species hypothesis using non-metric characters as a complement to statistical tests using more traditional metric characters. The sample examined is that of Asian and African Homo erectus. The paleoanthropological community is divided on the taxonomic distinction of these fossils, with workers arguing both for and against the species-level distinction between Asian and African populations. Previous arguments have focused on patterns of apparent morphological differentiation between the African and Asian cranial samples. To assess this question, three tests were performed that compared the range of variation in the fossil sample to a single-species group with a similar geographic distribution; this comparative sample was composed of 221 modern humans from Africa and Asia. For the first test, 23 metric characters were analyzed on the fossil and comparative samples. Using resampling procedures, the variation for these characters was examined, recreating 1000 samples from the human analogs and comparing the CV distributions of these samples to the CVs of the fossil group. The second test used the metric data to calculate a Euclidean distance between the African and Asian fossil samples. This distance was compared to a distribution of Euclidean distances calculated between 1000 randomly selected samples of African and Asian modern humans. For the third test, a grading scale was created for ten non-metric characters that encompassed the total morphological variation found in the fossil and modern human samples. The Manhattan distance between the Asian and African fossil samples was calculated and compared to a distribution of distances calculated between 1000 randomly selected samples of African and Asian moderns. The first two tests, using the metric data, failed to falsify the null hypothesis. However, in the third test, using non-metric data, the total Manhattan distance for the fossil sample approached the 100th percentile of the resampled distances calculated from the moderns. The implications of the contrasting results are discussed.  相似文献   

13.
D D Boos  C Brownie 《Biometrics》1992,48(1):61-72
New rank-based methods for analyzing data from multisite clinical trials are presented in the context of "mixed" linear models. In contrast to current rank methods, the new procedures test for a drug main effect in the presence of a random drug by site interaction (or drug by investigator interaction when there is only one investigator per site). Analogous procedures are also provided for the "fixed-effects" situation, and comparisons are made with current methods. The rationale for an analysis that assumes random investigator effects is described.  相似文献   

14.
该文对微机在牙形刺研究中的应用方法上进行了探索,在大量原始资料和数据的基础上,采用了模糊聚类分析和CAI计算机程序系统的研究,在地层划分,化石组合,沉积环境分析及生油气评价等方面都取得了一定的成果。  相似文献   

15.
MOTIVATION: The success of each method of cluster analysis depends on how well its underlying model describes the patterns of expression. Outlier-resistant and distribution-insensitive clustering of genes are robust against violations of model assumptions. RESULTS: A measure of dissimilarity that combines advantages of the Euclidean distance and the correlation coefficient is introduced. The measure can be made robust using a rank order correlation coefficient. A robust graphical method of summarizing the results of cluster analysis and a biological method of determining the number of clusters are also presented. These methods are applied to a public data set, showing that rank-based methods perform better than log-based methods. AVAILABILITY: Software is available from http://www.davidbickel.com.  相似文献   

16.
Recent advances in computing technology have enabled microsecond long all-atom molecular dynamics (MD) simulations of biological systems. Methods that can distill the salient features of such large trajectories are now urgently needed. Conventional clustering methods used to analyze MD trajectories suffer from various setbacks, namely (i) they are not data driven, (ii) they are unstable to noise and changes in cut-off parameters such as cluster radius and cluster number, and (iii) they do not reduce the dimensionality of the trajectories, and hence are unsuitable for finding collective coordinates. We advocate the application of principal component analysis (PCA) and a non-metric multidimensional scaling (nMDS) method to reduce MD trajectories and overcome the drawbacks of clustering. To illustrate the superiority of nMDS over other methods in reducing data and reproducing salient features, we analyze three complete villin headpiece folding trajectories. Our analysis suggests that the folding process of the villin headpiece is structurally heterogeneous.  相似文献   

17.
The analysis of terminal restriction fragment length polymorphisms (T-RFLP) of 16S rRNA genes has proven to be a facile means to compare microbial communities and presumptively identify abundant members. The method provides data that can be used to compare different communities based on similarity or distance measures. Once communities have been clustered into groups, clone libraries can be prepared from sample(s) that are representative of each group in order to determine the phylogeny of the numerically abundant populations in a community. In this paper methods are introduced for the statistical analysis of T-RFLP data that include objective methods for (i) determining a baseline so that 'true' peaks in electropherograms can be identified; (ii) a means to compare electropherograms and bin fragments of similar size; (iii) clustering algorithms that can be used to identify communities that are similar to one another; and (iv) a means to select samples that are representative of a cluster that can be used to construct 16S rRNA gene clone libraries. The methods for data analysis were tested using simulated data with assumptions and parameters that corresponded to actual data. The simulation results demonstrated the usefulness of these methods in their ability to recover the true microbial community structure generated under the assumptions made. Software for implementing these methods is available at http://www.ibest.uidaho.edu/tools/trflp_stats/index.php.  相似文献   

18.
Analyzing gene expression data in terms of gene sets: methodological issues   总被引:3,自引:0,他引:3  
MOTIVATION: Many statistical tests have been proposed in recent years for analyzing gene expression data in terms of gene sets, usually from Gene Ontology. These methods are based on widely different methodological assumptions. Some approaches test differential expression of each gene set against differential expression of the rest of the genes, whereas others test each gene set on its own. Also, some methods are based on a model in which the genes are the sampling units, whereas others treat the subjects as the sampling units. This article aims to clarify the assumptions behind different approaches and to indicate a preferential methodology of gene set testing. RESULTS: We identify some crucial assumptions which are needed by the majority of methods. P-values derived from methods that use a model which takes the genes as the sampling unit are easily misinterpreted, as they are based on a statistical model that does not resemble the biological experiment actually performed. Furthermore, because these models are based on a crucial and unrealistic independence assumption between genes, the P-values derived from such methods can be wildly anti-conservative, as a simulation experiment shows. We also argue that methods that competitively test each gene set against the rest of the genes create an unnecessary rift between single gene testing and gene set testing.  相似文献   

19.
Simple assumptions have led to equations by which the latent period in multiplication and the bacterial numbers expected at any time during the phase of rapid growth may be predicted. Experimental data obtained under rather diverse conditions have given satisfactory agreement with calculated values. Since the mathematical expressions contain no arbitrary constants, more than accidental significance must be attached to this agreement. The hypotheses set forth appear completely to describe the early development of Bacterium coli and Bacterium dysenteriae in broth, without postulating differences other than size among individual cells, or cells obtained under different conditions.  相似文献   

20.

Background  

There are many important clustering questions in computational biology for which no satisfactory method exists. Automated clustering algorithms, when applied to large, multidimensional datasets, such as flow cytometry data, prove unsatisfactory in terms of speed, problems with local minima or cluster shape bias. Model-based approaches are restricted by the assumptions of the fitting functions. Furthermore, model based clustering requires serial clustering for all cluster numbers within a user defined interval. The final cluster number is then selected by various criteria. These supervised serial clustering methods are time consuming and frequently different criteria result in different optimal cluster numbers. Various unsupervised heuristic approaches that have been developed such as affinity propagation are too expensive to be applied to datasets on the order of 106 points that are often generated by high throughput experiments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号