首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Determining the number of clusters using the weighted gap statistic   总被引:3,自引:0,他引:3  
Yan M  Ye K 《Biometrics》2007,63(4):1031-1037
Estimating the number of clusters in a data set is a crucial step in cluster analysis. In this article, motivated by the gap method (Tibshirani, Walther, and Hastie, 2001, Journal of the Royal Statistical Society B63, 411-423), we propose the weighted gap and the difference of difference-weighted (DD-weighted) gap methods for estimating the number of clusters in data using the weighted within-clusters sum of errors: a measure of the within-clusters homogeneity. In addition, we propose a "multilayer" clustering approach, which is shown to be more accurate than the original gap method, particularly in detecting the nested cluster structure of the data. The methods are applicable when the input data contain continuous measurements and can be used with any clustering method. Simulation studies and real data are investigated and compared among these proposed methods as well as with the original gap method.  相似文献   

2.
A method for calculation of the square surface area of the cerebral cortex is proposed which represents a combination of the method of averaged reconstruction after a continuous series of histological sections and the stereological method of determination of the length of any curve disposed on the surface. A comparative analysis of the proposed method with curvometrical and stereological (after Hennig's formula) method used earlier for this purpose is made taking as an example the calculation of the neocortex square surface area of some mammals. The results of calculation of the surface by the proposed method were found to coincide with curvometrical data within the range of 5% while the data obtained by stereological method of determination of the absolute square surface area differ from curvometrical data by more than 22%. The proposed method is very convenient and allows considerable acceleration in obtaining results.  相似文献   

3.
The finite element (FE) method when coupled with computed tomography (CT) is a powerful tool in orthopaedic biomechanics. However, substantial data is required for patient-specific modelling. Here we present a new method for generating a FE model with a minimum amount of patient data. Our method uses high order cubic Hermite basis functions for mesh generation and least-square fits the mesh to the dataset. We have tested our method on seven patient data sets obtained from CT assisted osteodensitometry of the proximal femur. Using only 12 CT slices we generated smooth and accurate meshes of the proximal femur with a geometric root mean square (RMS) error of less than 1 mm and peak errors less than 8 mm. To model the complex geometry of the pelvis we developed a hybrid method which supplements sparse patient data with data from the visible human data set. We tested this method on three patient data sets, generating FE meshes of the pelvis using only 10 CT slices with an overall RMS error less than 3 mm. Although we have peak errors about 12 mm in these meshes, they occur relatively far from the region of interest (the acetabulum) and will have minimal effects on the performance of the model. Considering that linear meshes usually require about 70-100 pelvic CT slices (in axial mode) to generate FE models, our method has brought a significant data reduction to the automatic mesh generation step. The method, that is fully automated except for a semi-automatic bone/tissue boundary extraction part, will bring the benefits of FE methods to the clinical environment with much reduced radiation risks and data requirement.  相似文献   

4.
We develop a nonparametric imputation technique to test for the treatment effects in a nonparametric two-factor mixed model with incomplete data. Within each block, an arbitrary covariance structure of the repeated measurements is assumed without the explicit parametrization of the joint multivariate distribution. The number of repeated measurements is uniformly bounded whereas the number of blocks tends to infinity. The essential idea of the nonparametric imputation is to replace the unknown indicator functions of pairwise comparisons by the corresponding empirical distribution functions. The proposed nonparametric imputation method holds valid under the missing completely at random (MCAR) mechanism. We apply the nonparametric imputation on Brunner and Dette's method for the nonparametric two-factor mixed model and this extension results in a weighted partial rank transform statistic. Asymptotic relative efficiency of the nonparametric imputation method with the complete data versus the incomplete data is derived to quantify the efficiency loss due to the missing data. Monte Carlo simulation studies are conducted to demonstrate the validity and power of the proposed method in comparison with other existing methods. A migraine severity score data set is analyzed to demonstrate the application of the proposed method in the analysis of missing data.  相似文献   

5.
A simple method for the spectral analysis of multispecies microfossil data through time or stratigraphic level is presented. The method is based on the Mantel correlogram, allowing any ecological similarity measure to be used. The method can therefore be applied to binary (presence-absence) data as well as raw or normalized species counts. In contrast with spectral analysis of univariate ordination scores, this approach does not explicitly discard information. The method, referred to as the Mantel periodogram, is exemplified with a data set from the literature, demonstrating several astronomically forced periodicities in microfaunal data from the Plio-Pleistocene.  相似文献   

6.
Microarray techniques using cDNA array and comparative genomic hybridization (CGH) have been developed for several discovery applications. They are frequently applied for the prediction and diagnosis of cancer in recent years. Many studies have shown that integrating genomic data from different sources may increase the reliability of gene expression analysis results in understanding cancer progression. Therefore, developing a good prognostic model dealing simultaneously with different types of dataset is important. The challenge with these types of data is high background noise. We describe an analytical two-stage framework with a multi-parallel data analysis method named wavelet-based generalized singular value decomposition and shaving method (WGSVD-shaving). This method is proposed for de-noising and dimension-reduction during early stage prognosis modeling. We also applied a supervised gene clustering technique with penalized logistic regression with Cox-model on an integrated data. We show the accuracy of the method using a simulated dataset with a case study on Hepatocelluar Carcinoma (HCC) cDNA and CGH data. The method shows improved results from GSVD-shaving and has application in the discovery of candidate genes associated with cancer.  相似文献   

7.
8.
The Self-Organizing Map (SOM) is an efficient tool for visualizing high-dimensional data. In this paper, an intuitive and effective SOM projection method is proposed for mapping high-dimensional data onto the two-dimensional grid structure with a growing self-organizing mechanism. In the learning phase, a growing SOM is trained and the growing cell structure is used as the baseline framework. In the ordination phase, the new projection method is used to map the input vector so that the input data is mapped to the structure of the SOM without having to plot the weight values, resulting in easy visualization of the data. The projection method is demonstrated on four different data sets, including a 118 patent data set and a 399 checical abstract data set related to polymer cements, with promising results and a significantly reduced network size.  相似文献   

9.
1IntroductionElbo(ECG)offersalotofilllcorralltinfondionforthediagnosisOfheartdis-eases.Berz1,seahaormalEChcax[lrins>llleu"knownsituations,thcycanbeCSUghtinthelongti1Ylecontin~11xHlltoriDg.ndterrnoultonngsystemwhichcanrecord24-hoUrECGdataisoneofeffectiveme~toprovidethefun~.AlthOUghthehag6scaleICmorestohavebeeddevelopepbedeavailabletostorelOngtboeECGdsta,itisveqdifficultyandtioublesomethatalopnUmbeOfdstaisprasersandstoredortiallsillltted.InthedigitedECGdata,thereare…  相似文献   

10.
Species-occurrence data sets tend to contain a large proportion of zero values, i.e., absence values (zero-inflated). Statistical inference using such data sets is likely to be inefficient or lead to incorrect conclusions unless the data are treated carefully. In this study, we propose a new modeling method to overcome the problems caused by zero-inflated data sets that involves a regression model and a machine-learning technique. We combined a generalized liner model (GLM), which is widely used in ecology, and bootstrap aggregation (bagging), a machine-learning technique. We established distribution models of Vincetoxicum pycnostelma (a vascular plant) and Ninox scutulata (an owl), both of which are endangered and have zero-inflated distribution patterns, using our new method and traditional GLM and compared model performances. At the same time we modeled four theoretical data sets that contained different ratios of presence/absence values using new and traditional methods and also compared model performances. For distribution models, our new method showed good performance compared to traditional GLMs. After bagging, area under the curve (AUC) values were almost the same as with traditional methods, but sensitivity values were higher. Additionally, our new method showed high sensitivity values compared to the traditional GLM when modeling a theoretical data set containing a large proportion of zero values. These results indicate that our new method has high predictive ability with presence data when analyzing zero-inflated data sets. Generally, predicting presence data is more difficult than predicting absence data. Our new modeling method has potential for advancing species distribution modeling.  相似文献   

11.
In risk assessment and environmental monitoring studies, concentration measurements frequently fall below detection limits (DL) of measuring instruments, resulting in left-censored data. The principal approaches for handling censored data include the substitution-based method, maximum likelihood estimation, robust regression on order statistics, and Kaplan-Meier. In practice, censored data are substituted with an arbitrary value prior to use of traditional statistical methods. Although some studies have evaluated the substitution performance in estimating population characteristics, they have focused mainly on normally and lognormally distributed data that contain a single DL. We employ Monte Carlo simulations to assess the impact of substitution when estimating population parameters based on censored data containing multiple DLs. We also consider different distributional assumptions including lognormal, Weibull, and gamma. We show that the reliability of the estimates after substitution is highly sensitive to distributional characteristics such as mean, standard deviation, skewness, and also data characteristics such as censoring percentage. The results highlight that although the performance of the substitution-based method improves as the censoring percentage decreases, its performance still depends on the population's distributional characteristics. Practical implications that follow from our findings indicate that caution must be taken in using the substitution method when analyzing censored environmental data.  相似文献   

12.
The present communication describes the determination of activity of immobilized biocatalysts, from progress curves, in the case of a poorly soluble substrate. Computer simulation is used for the comparison of five well known methods for the determination of initial velocity of the reaction with the modified method of smoothing experimental data by cubic spline. The computer simulation data show that, in the case of poorly soluble substrate, it is expedient to use the method of data linearization in the co-ordinate system p/t versus p and method of splines.  相似文献   

13.
A method is presented in this paper for the in-vivo estimation of the nonlinear pressure-volume relationship of the human aorta. The method is based on nonlinear elastic reservoir theory and utilizes clinical data that can be obtained with a high degree of accuracy, namely stroke volume, end diastolic ventricular volume and aortic pressure trace data. The computational procedure is described and then carried out for six cardiac patients. A method for the estimation of instantaneous left ventricular volume during the ejection period based on the considered nonlinear elastic reservoir theory is also presented. The method is applied for the six cardiac patients cited and the results compared with those obtained for the same subjects by a method of estimation based on linear elastic reservoir theory described in a previous paper by the author (1969).  相似文献   

14.
This paper deals with the analysis of experimental data obtained using an ergometer apparatus. A straightforward analysis method based on the power equation and the concept of generalized torques is presented. This method makes it possible to study the influence of the net muscle joint torques and gravity and inertia forces on the crank torque. The assumptions and limitations of the proposed method are discussed and this method is compared with the methods of analysis proposed by other researchers. In order to assess the validity of the method, some experimental data are elaborated. Results show that the method can highlight the effect of training and the pedaling technique of an athlete.  相似文献   

15.
The outgroup method is widely used to root phylogenetic trees. An accurate root indication, however, strongly depends on the availability of a proper outgroup. An alternate rooting method is the midpoint rooting (MPR). In this case, the root is set at the midpoint between the two most divergent operational taxonomic units. Although the midpoint rooting algorithm has been extensively used, the efficiency of this method in retrieving the correct root remains untested. In the present study, we empirically tested the success rate of the MPR in obtaining the outgroup root for a given phylogenetic tree. This was carried out by eliminating outgroups in 50 selected data sets from 33 papers and rooting the trees with the midpoint method. We were thus able to compare the root position retrieved by each method. Data sets were separated into three categories with different root consistencies: data sets with a single outgroup taxon (54% success rate for MPR), data sets with multiple outgroup taxa that showed inconsistency in root position (82% success rate), and data sets with multiple outgroup taxa in which root position was consistent (94% success rate). Interestingly, the more consistent the outgroup root is, the more successful MPR appears to be. This is a strong indication that the MPR method is valuable, particularly for cases where a proper outgroup is unavailable.  © 2007 The Linnean Society of London, Biological Journal of the Linnean Society , 2007, 92 , 669–674.  相似文献   

16.
The present communication describes the determination of activity of immobilized biocatalysts, from progress curves, in the case of a poorly soluble substrate. Computer simulation is used for the comparison of five well known methods for the determination of initial velocity of the reaction with the modified method of smoothing experimental data by cubic spline. The computer simulation data show that, in the case of poorly soluble substrate, it is expedient to use the method of data linearization in the co-ordinate system p/t versus p and method of splines.  相似文献   

17.
In many studies the twinning rate, being strongly dependent on maternal age (and parity), has been standardised according to the maternal age distribution. The direct method requires very informative twinning data for the target population. The indirect method is used when the data for the target population is not sufficiently informative or when the target population is small. We have earlier introduced an alternative indirect technique for standardising the twinning rate. Our technique requires even less of the twinning data. Besides maternal age, parity is an influential factor, and should, if possible, be taken into account. In this study we present the traditional standardisation methods based on both maternal age and parity, we propose a new direct standardisation method and we develop our standardisation methods so that they take into account both maternal age and parity. We apply these standardisation methods to data from Finland, 1953-1964, from St. Petersburg, Russia, 1882-92, from Canada 1952-1967, and from Denmark, 1896-1967. These methods all give results very similar to those for the Finnish data, but the effect of parity is strongest with the direct methods. This may be due to the fact that, among extramarital maternities, parity has a strongly increasing effect on the twinning rate. This may be attributed to a higher reproduction capacity among unmarried mothers. Standardisations of the Canadian and the Danish data also give reliable results. With the St. Petersburg data, however, the different standardisations show notable discrepancies. These discrepancies are compared with Allen's findings.  相似文献   

18.
Spike train data of many neurons can be obtained by multirecording techniques; however, the data make it difficult to estimate the connective structure in a large network. Neuron classification should be helpful in that regard, assuming that multiple neurons having similar connections with other neurons show a similar temporal firing pattern. We propose a novel method for classifying neurons based on temporal firing patterns of spike train data called the dynamical analysis with changing time resolution (DCT) method. The DCT method can evaluate temporal firing patterns by a simple algorithm with few arbitrary factors and automatically classify neurons by similarity of temporal firing patterns. In the DCT method, temporal firing patterns were objectively evaluated by analyzing their dependence on temporal resolution. We confirmed the effectiveness of the DCT method using actual spike train data.  相似文献   

19.
等位基因多态性群体遗传结构的多元非线性分析方法   总被引:4,自引:0,他引:4  
长期以来,对于多维基因多态性数据的多元统计分析,如计算遗传距离时昕用的聚类分析、分析群体遗传结构时所用的主成分分析、因子分析和典型相关分析等,一直应用为无约束条件数据而设计的经典多元线性分析方法,并没有注意基因多态性数据的“闭合效应”所带来的问题。从分析基因多态性数据的分布和结构特征入手,文中指出了基因多态性分布具有“闭合数据”的特点,分析了由于“闭合效应”的影响,经典多元线性方法用于群体遗传结构分析昕面临的困难。根据成分数据统计分析的理论和方法,提出了基因多态性群体遗传结构的多元非线性分析基本方法。并以主成分分析为例,通过实例比较和分析了经典线性主成分分析和“对数比”非线性主成分分析的结果,证明“对数比”非线性主成分分析方法是研究基因多态性群体遗传结构的良好方法,具有特异、灵敏等优点,其结果符合群体遗传学规律。  相似文献   

20.
The traditional approach to the representation of an articular surface is by using piecewise polynomial functions with a limited continuity to fit the surface from ordered data points. In this study, we introduce a new method, which is based on the influence surface theory of plates, for the representation of articular surfaces. The most significant advantage of this method is that it can effectively represent an articular surface from non-ordered data points. The effectiveness of the present method was shown by reconstruction of a human femoral surface and a mathematical cone.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号