首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
SUMMARY. The results of a survey of the macro-invertebrates of the polluted River Ely, South Wales, are used as a basis for comparing several classification methods which have been used previously in river survey work to determine species groupings. The methods compared are product-moment correlation (clustered by the nearest neighbour technique), Kendall's tau coefficient (clustered by the nearest neighbour and average linkage techniques), and Squared Euclidean-Distance coefficient (clustered by nearest neighbour and Ward's techniques). The species groupings determined by these methods were influenced both by the association coefficient and the technique used to cluster it. Some species were grouped together by all or most of the methods. The ecological validity of these robust groups is examined. A clear recommendation regarding the most appropriate method is frustrated by incomplete knowledge of the ecological requirements of most of the aquatic macro-invertebrates used in the data-set. However, Kendall's tau coefficient clustered by the average linkage technique appeared to produce ecologically meaningful species groups. Product-moment correlation was also reasonably successful and since it is based on absolute abundance data whereas Kendall's tau coefficient is based on relative abundance data, the use of the two together is recommended for determining robust groups.  相似文献   

2.
MOTIVATION: Microarray technology enables the study of gene expression in large scale. The application of methods for data analysis then allows for grouping genes that show a similar expression profile and that are thus likely to be co-regulated. A relationship among genes at the biological level often presents itself by locally similar and potentially time-shifted patterns in their expression profiles. RESULTS: Here, we propose a new method (CLARITY; Clustering with Local shApe-based similaRITY) for the analysis of microarray time course experiments that uses a local shape-based similarity measure based on Spearman rank correlation. This measure does not require a normalization of the expression data and is comparably robust towards noise. It is also able to detect similar and even time-shifted sub-profiles. To this end, we implemented an approach motivated by the BLAST algorithm for sequence alignment.We used CLARITY to cluster the times series of gene expression data during the mitotic cell cycle of the yeast Saccharomyces cerevisiae. The obtained clusters were related to the MIPS functional classification to assess their biological significance. We found that several clusters were significantly enriched with genes that share similar or related functions.  相似文献   

3.
Han L  Zhu J 《Bio Systems》2008,91(1):158-165
DNA arrays measure the expression levels for thousands of genes simultaneously under different conditions. These measurements reflect many aspects of the underlying biological processes. A method based on the matrix of thresholding partial correlation coefficients (MTPCC) is proposed for network inference from expression profiles. It includes three main parts: (1) hierarchical cluster analysis, (2) cluster boundaries establishment, and (3) regulatory network inference. The method was applied to the expression data of 2467 genes in Saccharomyces cerevisiae measured under 79 different conditions [Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D., 1998. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. 95, 14863-14868]. Using hierarchical clustering and cluster boundaries establishment, the 2467 genes were grouped into 12 clusters. The expression profiles of each cluster were expressed as a set of expression levels average over the cluster that constituted genes of each condition. Then the expression data of these clusters were subjected to the analysis of partial correlation, and the significance of each element in the obtained partial correlation coefficient matrix (PCCM) was examined by a permutation test. The corresponding undirected dependency graph (UDG) was obtained as a model of the regulatory network of S. cerevisiae. The veracity of the network was evidenced by the consistency of our results with the collected results from experimental studies.  相似文献   

4.
Summary Analysis of variance and principal components methods have been suggested for estimating repeatability. In this study, six estimation procedures are compared: ANOVA, principal components based on the sample covariance matrix and also on the sample correlation matrix, a related multivariate method (structural analysis) based on the sample covariance matrix and also on the sample correlation matrix, and maximum likelihood estimation. A simulation study indicates that when the standard linear model assumptions are met, the estimators are quite similar except when the repeatability is small. Overall, maximum likelihood appears the preferred method. If the assumption of equal variance is relaxed, the methods based on the sample correlation matrix perform better although others are surprisingly robust. The structural analysis method (with sample correlation matrix) appears to be best.Paper number 776 from the Department of Meat and Animal Science, University of Wisconsin-Madison.  相似文献   

5.
本文上关系数对9种啮齿动物的骨骼形态进行模糊聚类分析,所用公式,结果表明:聚类分析是一种适合于啮齿动物骨骼形态的数量分类方法。可用于探讨属,种间的亲缘系。  相似文献   

6.
7.
MOTIVATION: Recent advances in DNA microarray technologies have made it possible to measure the expression levels of thousands of genes simultaneously under different conditions. The data obtained by microarray analyses are called expression profile data. One type of important information underlying the expression profile data is the 'genetic network,' that is, the regulatory network among genes. Graphical Gaussian Modeling (GGM) is a widely utilized method to infer or test relationships among a plural of variables. RESULTS: In this study, we developed a method combining the cluster analysis with GGM for the inference of the genetic network from the expression profile data. The expression profile data of 2467 Saccharomyces cerevisiae genes measured under 79 different conditions (Eisen et al., PROC: Natl Acad. Sci. USA, 95, 14683-14868, 1998) were used for this study. At first, the 2467 genes were classified into 34 clusters by a cluster analysis, as a preprocessing for GGM. Then, the expression levels of the genes in each cluster were averaged for each condition. The averaged expression profile data of 34 clusters were subjected to GGM, and a partial correlation coefficient matrix was obtained as a model of the genetic network of S. cerevisiae. The accuracy of the inferred network was examined by the agreement of our results with the cumulative results of experimental studies.  相似文献   

8.
Identification of phenotypic modules, semiautonomous sets of highly correlated traits, can be accomplished through exploratory (e.g., cluster analysis) or confirmatory approaches (e.g., RV coefficient analysis). Although statistically more robust, confirmatory approaches are generally unable to compare across different model structures. For example, RV coefficient analysis finds support for both two‐ and six‐module models for the therian mammalian skull. Here, we present a maximum likelihood approach that takes into account model parameterization. We compare model log‐likelihoods of trait correlation matrices using the finite‐sample corrected Akaike Information Criterion, allowing for comparison of hypotheses across different model structures. Simulations varying model complexity and within‐ and between‐module contrast demonstrate that this method correctly identifies model structure and parameters across a wide range of conditions. We further analyzed a dataset of 3‐D data, consisting of 61 landmarks from 181 macaque (Macaca fuscata) skulls, distributed among five age categories, testing 31 models, including no modularity among the landmarks and various partitions of two, three, six, and eight modules. Our results clearly support a complex six‐module model, with separate within‐ and intermodule correlations. Furthermore, this model was selected for all five age categories, demonstrating that this complex pattern of integration in the macaque skull appears early and is highly conserved throughout postnatal ontogeny. Subsampling analyses demonstrate that this method is robust to relatively low sample sizes, as is commonly encountered in rare or extinct taxa. This new approach allows for the direct comparison of models with different parameterizations, providing an important tool for the analysis of modularity across diverse systems.  相似文献   

9.
10.
The typing of C. albicans by MLEE (multilocus enzyme electrophoresis) is dependent on the interpretation of enzyme electrophoretic patterns, and the study of the epidemiological relationships of these yeasts can be conducted by cluster analysis. Therefore, the aims of the present study were to first determine the discriminatory power of genetic interpretation (deduction of the allelic composition of diploid organisms) and numerical interpretation (mere determination of the presence and absence of bands) of MLEE patterns, and then to determine the concordance (Pearson product-moment correlation coefficient) and similarity (Jaccard similarity coefficient) of the groups of strains generated by three cluster analysis models, and the discriminatory power of such models as well [model A: genetic interpretation, genetic distance matrix of Nei (d(ij)) and UPGMA dendrogram; model B: genetic interpretation, Dice similarity matrix (S(D1)) and UPGMA dendrogram; model C: numerical interpretation, Dice similarity matrix (S(D2)) and UPGMA dendrogram]. MLEE was found to be a powerful and reliable tool for the typing of C. albicans due to its high discriminatory power (>0.9). Discriminatory power indicated that numerical interpretation is a method capable of discriminating a greater number of strains (47 versus 43 subtypes), but also pointed to model B as a method capable of providing a greater number of groups, suggesting its use for the typing of C. albicans by MLEE and cluster analysis. Very good agreement was only observed between the elements of the matrices S(D1) and S(D2), but a large majority of the groups generated in the three UPGMA dendrograms showed similarity S(J) between 4.8% and 75%, suggesting disparities in the conclusions obtained by the cluster assays.  相似文献   

11.
Clinical studies are often concerned with assessing whether different raters/methods produce similar values for measuring a quantitative variable. Use of the concordance correlation coefficient as a measure of reproducibility has gained popularity in practice since its introduction by Lin (1989, Biometrics 45, 255-268). Lin's method is applicable for studies evaluating two raters/two methods without replications. Chinchilli et al. (1996, Biometrics 52, 341-353) extended Lin's approach to repeated measures designs by using a weighted concordance correlation coefficient. However, the existing methods cannot easily accommodate covariate adjustment, especially when one needs to model agreement. In this article, we propose a generalized estimating equations (GEE) approach to model the concordance correlation coefficient via three sets of estimating equations. The proposed approach is flexible in that (1) it can accommodate more than two correlated readings and test for the equality of dependent concordant correlation estimates; (2) it can incorporate covariates predictive of the marginal distribution; (3) it can be used to identify covariates predictive of concordance correlation; and (4) it requires minimal distribution assumptions. A simulation study is conducted to evaluate the asymptotic properties of the proposed approach. The method is illustrated with data from two biomedical studies.  相似文献   

12.
Carrasco JL  Jover L 《Biometrics》2003,59(4):849-858
The intraclass correlation coefficient (ICC) and the concordance correlation coefficient (CCC) are two of the most popular measures of agreement for variables measured on a continuous scale. Here, we demonstrate that ICC and CCC are the same measure of agreement estimated in two ways: by the variance components procedure and by the moment method. We propose estimating the CCC using variance components of a mixed effects model, instead of the common method of moments. With the variance components approach, the CCC can easily be extended to more than two observers, and adjusted using confounding covariates, by incorporating them in the mixed model. A simulation study is carried out to compare the variance components approach with the moment method. The importance of adjusting by confounding covariates is illustrated with a case example.  相似文献   

13.
The coefficient of determination (R2) is a common measure of goodness of fit for linear models. Various proposals have been made for extension of this measure to generalized linear and mixed models. When the model has random effects or correlated residual effects, the observed responses are correlated. This paper proposes a new coefficient of determination for this setting that accounts for any such correlation. A key advantage of the proposed method is that it only requires the fit of the model under consideration, with no need to also fit a null model. Also, the approach entails a bias correction in the estimator assessing the variance explained by fixed effects. Three examples are used to illustrate new measure. A simulation shows that the proposed estimator of the new coefficient of determination has only minimal bias.  相似文献   

14.
Pragmatic trials evaluating health care interventions often adopt cluster randomization due to scientific or logistical considerations. Systematic reviews have shown that coprimary endpoints are not uncommon in pragmatic trials but are seldom recognized in sample size or power calculations. While methods for power analysis based on K ( K 2 $K\ge 2$ ) binary coprimary endpoints are available for cluster randomized trials (CRTs), to our knowledge, methods for continuous coprimary endpoints are not yet available. Assuming a multivariate linear mixed model (MLMM) that accounts for multiple types of intraclass correlation coefficients among the observations in each cluster, we derive the closed-form joint distribution of K treatment effect estimators to facilitate sample size and power determination with different types of null hypotheses under equal cluster sizes. We characterize the relationship between the power of each test and different types of correlation parameters. We further relax the equal cluster size assumption and approximate the joint distribution of the K treatment effect estimators through the mean and coefficient of variation of cluster sizes. Our simulation studies with a finite number of clusters indicate that the predicted power by our method agrees well with the empirical power, when the parameters in the MLMM are estimated via the expectation-maximization algorithm. An application to a real CRT is presented to illustrate the proposed method.  相似文献   

15.
本文以生活在不同地区的9组人群的成年男性头骨(668例)为主要研究对象,通过对其14项测量性状的聚类分析和主成分分析,探讨多变量统计分析方法在人类学研究中的价值。结果显示:欧氏距离系数可以初步判断各组人群的相互关系及差异;根据聚类分析树枝图推出的人群间的相互关系受作者主观意识的影响,可信的结论应建立在多种聚类方法产生的结果一致的基础上;主成分分析的结果与选取的变量有一定关系,选取不同的变量组,其结果会受到影响。同聚类分析方法相比,主成分分析方法相对较好地反映了人群间的相互关系。本文研究结果提示,应慎重对待多变量统计方法得出的人群间相互关系的结论。  相似文献   

16.
MOTIVATION: Time-course microarray experiments are designed to study biological processes in a temporal fashion. Longitudinal gene expression data arise when biological samples taken from the same subject at different time points are used to measure the gene expression levels. It has been observed that the gene expression patterns of samples of a given tumor measured at different time points are likely to be much more similar to each other than are the expression patterns of tumor samples of the same type taken from different subjects. In statistics, this phenomenon is called the within-subject correlation of repeated measurements on the same subject, and the resulting data are called longitudinal data. It is well known in other applications that valid statistical analyses have to appropriately take account of the possible within-subject correlation in longitudinal data. RESULTS: We apply estimating equation techniques to construct a robust statistic, which is a variant of the robust Wald statistic and accounts for the potential within-subject correlation of longitudinal gene expression data, to detect genes with temporal changes in expression. We associate significance levels to the proposed statistic by either incorporating the idea of the significance analysis of microarrays method or using the mixture model method to identify significant genes. The utility of the statistic is demonstrated by applying it to an important study of osteoblast lineage-specific differentiation. Using simulated data, we also show pitfalls in drawing statistical inference when the within-subject correlation in longitudinal gene expression data is ignored.  相似文献   

17.
18.
Graph theoretical approaches have successfully revealed abnormality in brain connectivity, in particular, for contrasting patients from healthy controls. Besides the group comparison analysis, a correlational study is also challenging. In studies with patients, for example, finding brain connections that indeed deepen specific symptoms is interesting. The correlational study is also beneficial since it does not require controls, which are often difficult to find, especially for old-age patients with cognitive impairment where controls could also have cognitive deficits due to normal ageing. However, one of the major difficulties in such correlational studies is too conservative multiple comparison correction. In this paper, we propose a novel method for identifying brain connections that are correlated with a specific cognitive behavior by employing cluster-based statistics, which is less conservative than other methods, such as Bonferroni correction, false discovery rate procedure, and extreme statistics. Our method is based on the insight that multiple brain connections, rather than a single connection, are responsible for abnormal behaviors. Given brain connectivity data, we first compute a partial correlation coefficient between every edge and the behavioral measure. Then we group together neighboring connections with strong correlation into clusters and calculate their maximum sizes. This procedure is repeated for randomly permuted assignments of behavioral measures. Significance levels of the identified sub-networks are estimated from the null distribution of the cluster sizes. This method is independent of network construction methods: either structural or functional network can be used in association with any behavioral measures. We further demonstrated the efficacy of our method using patients with subcortical vascular cognitive impairment. We identified sub-networks that are correlated with the disease severity by exploiting diffusion tensor imaging techniques. The identified sub-networks were consistent with the previous clinical findings having valid significance level, while other methods did not assert any significant findings.  相似文献   

19.
This paper explores the relevance of the variables that define well-being and human progress and makes a quantitative inquiry into the validity of three of the well-known and well-documented composite indicators of well-being: the Human Development Index (HDI), the Legatum Prosperity Index (LPI) and the Happy Planet Index (HPI). After choosing the key variables that describe most of the objective and subjective dimensions of well-being, we perform cluster analysis to come up with an optimal grouping of countries based on their multidimensional performance on well-being. A comparison of the classifications obtained with the three indexes invalidates the HPI, confirms results obtained for the HDI, and validates for the first time the LPI as a reliable measure of well-being. The optimal cluster structure yields robust results, which correct the rank discrepancies between the HDI and LPI for a large number of countries. It also proves that a robust ranking of countries based on multidimensional well-being can be achieved with a relatively small number of variables, which mitigates the risk of including variables that are not reliable and/or not available for a significant number of countries. The fact that cluster analysis generates results based on similarities between observations and not on computed values based on the aggregation of variables helps overcome problems that may occur due to the distribution of variables and increases its value as a validation method. Therefore, validation results achieved through cluster analysis are more robust and help to achieve a good check of the validity and relevance of the composite indexes, provide an objective perspective that can guide policy-makers and the public in making a fair assessment of actual levels of well-being, and avoid unfounded claims that may overstate it and delay or postpone measures to increase it.  相似文献   

20.
Two different chaotic time series analysis methods – the correlation dimension and nonlinear forecasting – are introduced and then used to process the interspike intervals (ISI) of the action potential trains propagated along a single nerve fiber of the anesthetized rat. From the results, the conclusion is drawn that compared with the correlation dimension, nonlinear forecasting is more efficient and robust for chaotic ISI time series analysis in a noisy environment. Moreover, the evolution of the correlation coefficient curves calculated from nonlinear forecasting can qualitatively give a better reflection of the unpredictability of the system's future behavior and is in good agreement with the values of the largest Lyapunov exponent that quantitatively measures the degree of chaos. Received: 19 November 1996 / Accepted in revised form: 15 September 1997  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号