首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 18 毫秒
1.
A random forest method has been selected to perform both gene selection and classification of the microarray data. In this embedded method, the selection of smallest possible sets of genes with lowest error rates is the key factor in achieving highest classification accuracy. Hence, improved gene selection method using random forest has been proposed to obtain the smallest subset of genes as well as biggest subset of genes prior to classification. The option for biggest subset selection is done to assist researchers who intend to use the informative genes for further research. Enhanced random forest gene selection has performed better in terms of selecting the smallest subset as well as biggest subset of informative genes with lowest out of bag error rates through gene selection. Furthermore, the classification performed on the selected subset of genes using random forest has lead to lower prediction error rates compared to existing method and other similar available methods.  相似文献   

2.
Besides the problem of searching for effective methods for data analysis there are some additional problems with handling data of high uncertainty. Uncertainty problems often arise in an analysis of ecological data, e.g. in the cluster analysis of ecological data. Conventional clustering methods based on Boolean logic ignore the continuous nature of ecological variables and the uncertainty of ecological data. That can result in misclassification or misinterpretation of the data structure. Clusters with fuzzy boundaries reflect better the continuous character of ecological features. But the problem is, that the common clustering methods (like the fuzzy c-means method) are only designed for treating crisp data, that means they provide a fuzzy partition only for crisp data (e.g. exact measurement data). This paper presents the extension and implementation of the method of fuzzy clustering of fuzzy data proposed by Yang and Liu [Yang, M.-S. and Liu, H-H, 1999. Fuzzy clustering procedures for conical fuzzy vector data. Fuzzy Sets and Systems, 106, 189-200.]. The imprecise data can be defined as multidimensional fuzzy sets with not sharply formed boundaries (in the form of the so-called conical fuzzy vectors). They can then be used for the fuzzy clustering together with crisp data. That can be particularly useful when information is not available about the variances which describe the accuracy of the data and probabilistic approaches are impossible. The method proposed by Yang has been extended and implemented for the Fuzzy Clustering System EcoFucs developed at the University of Kiel. As an example, the paper presents the fuzzy cluster analysis of chemicals according to their ecotoxicological properties. The uncertainty and imprecision of ecotoxicological data are very high because of the use of various data sources, various investigation tests and the difficulty of comparing these data. The implemented method can be very helpful in searching for an adequate partition of ecological data into clusters with similar properties.  相似文献   

3.
This paper presents an electrocardiogram (ECG) data mining scheme based on the ECG frame classification realised by a dynamic time warping (DTW) matching technique, which has been used successfully in speech recognition. We use the DTW to classify ECG frames because ECG and speech signals have similar non-stationary characteristics. The DTW mapping function is obtained by searching the frame from its end to start. A threshold is setup for DTW matching residual either to classify an ECG frame or to add a new class. Classification and establishment of a template set are carried out simultaneously. A frame is classified into a category with a minimal residual and satisfying a threshold requirement. A classification residual of 1.33% is achieved by the DTW for a 10-min ECG recording.  相似文献   

4.
基因芯片数据分析方法研究进展   总被引:2,自引:0,他引:2  
基因芯片技术的出现改变了生物医学研究的前景,其产生的海量数据是限制其发展的瓶颈问题。为提取其中所隐含的有价值的信息,在基因芯片数据分析的复杂计算工具和方法方面近年来有很多尝试。本文对近5年来基因芯片表达数据的分类分析方法进行综述,既分类比较了以聚类分析为基础的分类方法,也吸收了当前应用数据挖掘、信息融合等系统生物学思路的研究技术,并对数据的分析结果进行评价。  相似文献   

5.
In vitro pattern classification has been highlighted as an important future application of DNA computing. Previous work has demonstrated the feasibility of linear classifiers using DNA-based molecular computing. However, complex tasks require non-linear classification capability. Here we design a molecular beacon that can interact with multiple targets and experimentally shows that its fluorescent signals form a complex radial-basis function, enabling it to be used as a building block for non-linear molecular classification in vitro. The proposed method was successfully applied to solving artificial and real-world classification problems: XOR and microRNA expression patterns.  相似文献   

6.
For years, tree-structured analytic methods have appealed to researchers for many reasons: data mining, exploratory data analysis, and formation and testing of non parametric and parametric models among them. Classification and Regression Tree (CART) analysis has offered one of the more efficient and accurate of these methods since it was first presented in a 1984 monograph by Leo Breiman, Jerome Freidman, Richard Olshen, and Charles Stone (Breiman et al, 1984). Until recently, however, only a command-line interface has been available in powerful applications of the method, limiting its accessibility to users without the time and support to learn both the method and the command syntax. Now, Salford Systems, in collaboration with the authors of CART, has added a graphical user interface and a several new features to the original FORTRAN source code and produced a Windows version of CARTTM (v. 3.1) for the rest of us.  相似文献   

7.
Measurements provide the basis for process monitoring and control as well as for model development and validation. Systematic approaches to increase the accuracy and credibility of the empirical data set are therefore of great value. In (bio)chemical conversions, linear conservation relations such as the balance equations for charge, enthalpy, and/or chemical elements, can be employed to relate conversion rates. In a pactical situation, some of these rates will be measured (in effect, be calculated directly from primary measurements of, e.g., concentrations and flow rates), as others can or cannot be calculated from the measured ones. When certain measured rates can also be calculated from other measured rates, the set of equations, the accuracy and credibility of the measured rates can indeed be improved by, respectively, balancing and gross error diagnosis. The balanced conversion rates are more accurate, and form a consistent set of data, which is more suitable for further application (e.g., to calculate nonmeasured rates) than the raw measurements. Such an approach has drawn attention in previous studies. The current study deals mainly with the problem of mathematically classifying the conversion rates into balanceable and calculable rates, given the subset of measured rates. The significance of this problem is illustrated with some examples. It is shown that a simple matrix equation can be derived that contains the vector of measured conversion rates and the redundancy matrix R. Matrix R plays a predominant role in the classification problem. In supplementary articles, significance of the redundancy matrix R for an improved gross error diagnosis approach will be shown. In addition, efficient equations have been derived to calculate the balanceable and/or calculable rates. The method is completely based on matrix algebra (principally different from the graph-theoretical approach), and it is easily implemented into a computer program. (c) 1994 John Wiley & Sons, Inc.  相似文献   

8.
在国际趋势的压力下,兼受跨太平洋伙伴关系协定(TPP)影响,我国实施药品数据保护制度已成大势所趋。鉴于药品数据保护与药品注册审批体系紧密相关,我国化学药注册分类改革后,药品数据保护对医药产业发展必然会带来不同影响。结合我国分类注册改革新标准,在理论上对数据保护制度实施效果进行研究,进而探讨新注册分类下数据保护的实施对1至4类药品及医药产业的影响。  相似文献   

9.
The recent increase in data accuracy from high resolution accelerometers offers substantial potential for improved understanding and prediction of animal movements. However, current approaches used for analysing these multivariable datasets typically require existing knowledge of the behaviors of the animals to inform the behavioral classification process. These methods are thus not well‐suited for the many cases where limited knowledge of the different behaviors performed exist. Here, we introduce the use of an unsupervised learning algorithm. To illustrate the method's capability we analyse data collected using a combination of GPS and Accelerometers on two seabird species: razorbills (Alca torda) and common guillemots (Uria aalge). We applied the unsupervised learning algorithm Expectation Maximization to characterize latent behavioral states both above and below water at both individual and group level. The application of this flexible approach yielded significant new insights into the foraging strategies of the two study species, both above and below the surface of the water. In addition to general behavioral modes such as flying, floating, as well as descending and ascending phases within the water column, this approach allowed an exploration of previously unstudied and important behaviors such as searching and prey chasing/capture events. We propose that this unsupervised learning approach provides an ideal tool for the systematic analysis of such complex multivariable movement data that are increasingly being obtained with accelerometer tags across species. In particular, we recommend its application in cases where we have limited current knowledge of the behaviors performed and existing supervised learning approaches may have limited utility.  相似文献   

10.
Aim: To propose a modification of the TWINSPAN algorithm that enables production of divisive classifications that better respect the structure of the data. Methods: The proposed modification combines the classical TWINSPAN algorithm with analysis of heterogeneity of the clusters prior to each division. Four different heterogeneity measures are involved: Whittaker's beta, total inertia, average Sørensen dissimilarity and average Jaccard dissimilarity. Their performance was evaluated using empirical vegetation datasets with different numbers of plots and different levels of heterogeneity. Results: While the classical TWINSPAN algorithm divides each cluster coming from the previous division step, the modified algorithm divides only the most heterogeneous cluster in each step. The four tested heterogeneity measures may produce identical or very similar results. However, average Jaccard and Sørensen dissimilarities may reach extreme values in clusters of small size and may produce classifications with a highly unbalanced cluster size. Conclusions: The proposed modification does not alter the logic of the TWINSPAN classification, but it may change the hierarchy of divisions in the final classification. Thus, unsubstantiated divisions of homogeneous clusters are prevented, and classifications with any number of terminal clusters can be created, which increases the flexibility of TWINSPAN.  相似文献   

11.
In this metaphorical ‘composition’, I comment on nine ‘dissonant chords’ related to the drowning out of cladistic performance: (1) DNA-based phylogenetic hypotheses supported only by bootstrap values and without molecular synapomorphies; (2) the use of molecular data to the exclusion of morphological data, with the classification of clades diagnosed by morphological plesiomorphies plus bootstrap values; (3) neglect of the results of the congruence test and how they are interpreted; (4) the combination of character optimization using both model-based and parsimony methods, and its consequences; (5) the need to effectively integrate ontogeny and phylogeny; (6) the estimation of the ages of clades based on molecular-clock analyses; (7) the belief that new methods, theories, and hypotheses are more reliable than old ones, with the idea that model-based analyses achieve better results than parsimony analyses; (8) the false assumption of the irrelevance of classification; and (9) clashes amongst cladists themselves, who endorse distinct methods, philosophies, and theories. Finally, I present 10 ‘refrains’ in order to intensify the cladistic performance.  相似文献   

12.
树种多样性是生态学研究的重要内容,树木的种类和空间分布信息可有效服务于可持续森林管理。但在复杂林分条件下,获取高精度分类结果的难度大。而无人机遥感可获取局域超精细数据,为树种分类精度的提高提供了可能。基于可见光、高光谱、激光雷达等多源无人机遥感数据,探究其在亚热带林分条件下的树种分类潜力。研究发现:(1)随机森林分类器总体精度和各树种的F1分数最高,适合亚热带多树种的分类制图,其区分13种类别(8乔木,4草本)的总体精度为95.63%,Kappa系数为0.948;(2)多源数据的使用可以显著提高分类精度,全特征模型精度最高,且高光谱和激光雷达数据显著影响全特征模型分类精度,可见光纹理数据作用较小;(3)分类特征重要性从大到小排序为结构信息,植被指数,纹理信息,最小噪声变换分量。  相似文献   

13.
环境灾害遥感小卫星在辽河三角洲湿地景观制图中的应用   总被引:2,自引:1,他引:2  
及时、准确地获得湿地的空间分布,对湿地的动态监测、保护与可持续利用具有重要的意义.环境灾害遥感小卫星星座A、B星(HJ-1A/1B星)是我国自主发射的陆地资源监测卫星,可为湿地类型的提取提供新的遥感影像数据源.本文通过对比我国环境灾害遥感小卫星CCD相机影像(HJ CCD)数据与Landsat TM5影像数据获取的湿地景观类型图的分类精度和各景观类型面积,验证和探究了HJ CCD数据在湿地景观动态变化监测中的适用性和应用潜力.结果表明:HJ CCD数据在地物识别分类方面可完全替代Landsat TM5数据;在实时动态监测方面,HJ CCD数据获取周期仅为2 d,优于Landsat TM5数据(16 d).  相似文献   

14.
The genus Fascicularia Mez is revised as part of a study of the Bromeliaceae for the Flora de Chile. Morphological and anatomical investigation of herbarium and living material from cultivation as well as DNA-studies (RAPDs) in cultivated material has led us to conclude that Fascicularia bicolor (Ruiz & Pav.) Mez, the only one species in the genus, comprises two subspecies which are distinguished by their leaf anatomy and morphology.  相似文献   

15.
目的 探讨聚类分析法在确定护理单元分类中的应用,客观、公正的评价护理单元,在医护分开核算模式下,为护理绩效奖金分配及人力资源配置提供参考依据。方法 利用自设调查表及某公司提供的量表,抽样统计护理人员对护理单元分类的主观评价,应用聚类分析法,结合客观业务数据,确定不同护理单元的分类。结果 将全院78个护理单元分为1类至5类共5个类别,1类护理单元9个,2类护理单元21个,3类护理单元22个,4类护理单元12个,5类护理单元14个。结论 调整后的护理单元聚类值作为护理单元分类系数,能够全面、客观、相对公平的反映不同护理单元的护理工作负荷、难度及风险,可以作为医院护理绩效考评奖金分配及科室管理的依据。  相似文献   

16.
Increasingly, animal behavior studies are enhanced through the use of accelerometry. To allow translation of raw accelerometer data to animal behaviors requires the development of classifiers. Here, we present the “rabc” (r for animal behavior classification) package to assist researchers with the interactive development of such animal behavior classifiers in a supervised classification approach. The package uses datasets consisting of accelerometer data with their corresponding animal behaviors (e.g., for triaxial accelerometer data along the x, y and z axes arranged as “x, y, z, x, y, z,…, behavior”). Using an example dataset collected on white stork (Ciconia ciconia), we illustrate the workflow of this package, including accelerometer data visualization, feature calculation, feature selection, feature visualization, extreme gradient boost model training, validation, and, finally, a demonstration of the behavior classification results.  相似文献   

17.
基于MODIS和ENVISAT数据的湖北省四湖地区土地覆盖分类   总被引:2,自引:0,他引:2  
依据ENVISAT ASAR数据中后向散射系数的差异对湖北四湖地区的城镇、水域、植被覆盖区进行划分;根据区域种植制度及物候特点,以4月下旬至5月上旬MODIS-NDVI值区别植被覆盖区中的作物植被与非作物植被;并采用基于高分辨率ETM数据的土地分类结果对基于上述规则分类的结果进行验证.结果表明:借助DEM高程数据,可将研究区非作物植被划分为林地与滩地;利用月份NDVI平均值的差异,可将作物划分为中稻、棉花和晚稻,并获得水田和旱地的区分;采用低空间分辨率的MODIS数据获得的土地覆盖分类结果,与采用高分辨率ETM数据分类的结果具有一定程度的相似性,以ETM数据分类结果为标准的总误差率为13.15%;利用上述判别流程进行大尺度土地覆盖分类与制图可以实现对区域土地覆盖变化的快速跟踪.  相似文献   

18.
A phylogenetic analysis of a combined data set for 560 angiosperms and seven outgroups based on three genes, 18S rDNA (1855 bp), rbcL (1428 bp), and atpB (1450 bp) representing a total of 4733 bp is presented. Parsimony analysis was expedited by use of a new computer program, the RATCHET. Parsimony jackknifing was performed to assess the support of clades. The combination of three data sets for numerous species has resulted in the most highly resolved and strongly supported topology yet obtained for angiosperms. In contrast to previous analyses based on single genes, much of the spine of the tree and most of the larger clades receive jackknife support 250%. Some of the noneudicots form a grade followed by a strongly supported eudicot clade. The early‐branching angiosperms are Amborellaceae, Nymphaeaceae, and a clade of Austrobaileyaceae, Illiciaceae, and Schi‐sandraceae. The remaining noneudicots, except Ceratophyllaceae, form a weakly supported core eumagnoliid clade comprising six well‐supported subclades: Chloranthaceae, monocots, WinteraceaeICanellaceae, Piperales, Laurales, and Magnoliales. Ceratophyllaceae are sister to the eudicots. Within the well‐supported eudicot clade, the early‐diverging eudicots (e.g. Proteales, Ranunculales, Trochodendraceae, Sabiaceae) form a grade, followed by the core eudicots, the monophyly of which is also strongly supported. The core eudicots comprise six well‐supported subclades: (1) Berberidopsidaceae/Aextoxicaceae; (2) Myrothamnaceae/ Gunneraceae; (3) Saxifragales, which are the sister to Vitaceae (including Leea) plus a strongly supported eurosid clade; (4) Santalales; (5) Caryophyllales, to which Dilleniaceae are sister; and (6) an asterid clade. The relationships among these six subclades of core eudicots do not receive strong support. This large data set has also helped place a number of enigmatic angiosperm families, including Podostemaceae, Aphloiaceae, and Ixerbaceae. This analysis further illustrates the tractability of large data sets and supports a recent, phylogenetically based, ordinal‐level reclassification of the angiosperms based largely, but not exclusively, on molecular (DNA sequence) data.  相似文献   

19.
陈劲松  韩宇  陈工  张瑾 《生态学报》2014,34(24):7233-7242
准确高效的获取土地利用信息对生态环境评价非常重要。广东省地处华南热带和亚热带季风气候区,经济作物种类繁多,土地覆盖破碎,为土地利用精确分类带来很大不确定性,而常年多云雨的天气也为有效光学影像的获取带来困难。为提高土地覆盖分类精度,以雷州半岛为实验区,综合应用Landsat-TM/ETM、多时相HJ光学影像,以及X波段Terra SAR数据,通过分析不同地物类型在光谱、极化以及多时相特征上的差别,对原始图像进行特征提取。在此基础上融合多源遥感信息的地物特征运用面向对象土地覆盖分类方法获取研究区高精度的土地利用信息。结果显示这一方法能有效提高土地覆盖利用信息获取精度,为研究生态环境变化提供更准确的数据支持。  相似文献   

20.
Methods of evaluating loci in studies of mixture composition or individual assignment are largely based on performance characteristics of individual loci. Synergisms between loci are not exploited. Loci are often evaluated based on their ability to resolve individual populations, even though multipopulation aggregations are more commonly of interest. In addition, measures of locus performance may indirectly relate to investigative objectives. A new computer program, bels, offers an alternative that addresses these limitations and may be preferable to existing methods in some applications. The algorithm is illustrated using Yukon River chum salmon (Oncorhynchus keta) data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号