首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A typical small-sample biomarker classification paper discriminates between types of pathology based on, say, 30,000 genes and a small labeled sample of less than 100 points. Some classification rule is used to design the classifier from this data, but we are given no good reason or conditions under which this algorithm should perform well. An error estimation rule is used to estimate the classification error on the population using the same data, but once again we are given no good reason or conditions under which this error estimator should produce a good estimate, and thus we do not know how well the classifier should be expected to perform. In fact, virtually, in all such papers the error estimate is expected to be highly inaccurate. In short, we are given no justification for any claims.Given the ubiquity of vacuous small-sample classification papers in the literature, one could easily conclude that scientific knowledge is impossible in small-sample settings. It is not that thousands of papers overtly claim that scientific knowledge is impossible in regard to their content; rather, it is that they utilize methods that preclude scientific knowledge. In this paper, we argue to the contrary that scientific knowledge in small-sample classification is possible provided there is sufficient prior knowledge. A natural way to proceed, discussed herein, is via a paradigm for pattern recognition in which we incorporate prior knowledge in the whole classification procedure (classifier design and error estimation), optimize each step of the procedure given available information, and obtain theoretical measures of performance for both classifiers and error estimators, the latter being the critical epistemological issue. In sum, we can achieve scientific validation for a proposed small-sample classifier and its error estimate.  相似文献   

2.
In this paper we consider a generalization of the measures of imbalance given by AHRENS and PINCUS (1981) considering the cases: m-fold hierarchical model and m-way classification model in order to quantify the degree of imbalance in an unbalanced design. These measures of imbalance satisfy the same properties as those for the one-way classification model.  相似文献   

3.
The performance of a cell recognition system on unknown data is often estimated in terms of its error rates on a test set. This paper investigates methods for producing estimates of error rates in cervical cell classification. Classification performance curves calculated using these methods are given for several classification schemes used to classify 1500 cervical cells.  相似文献   

4.
逻辑学原理是各种分类系统科学性及规范性的必要检验工具。本文采用逻辑学原理检验基于优势种的《中国植被》的植被分类系统, 结果发现目前常用的植被分类系统存在较多逻辑错误, 需要予以纠正。于是, 在强调植物生活型分类系统和植被分类系统一致性的基础上, 依据逻辑学原理给出建立植被分类系统的步骤和方法, 提出规范的植物生活型分类系统和植被分类系统示例方案。鉴于多建群种植被的客观存在及其存在形式多样, 在分类系统中给出相应的位置——多建群种植被纲。同时, 针对国内植被分类学界从未形成统一的植被命名规则, 且又有多种命名方式并存的现状, 提出了函数命名法。  相似文献   

5.
The object of this paper is to present an original classification of ontogenetic reproduction. The main general criterion used is the degree and type of phylogenetic differentiation. In relation to this criterion, criteria are given for the classification of the fundamental types of ontogenetic reproduction and for the classification of the types of ontogenetic generation cycles. Between the fundamental types of ontogenetic reproduction and the types of ontogenetic generation cycles there is a hierarchical relationship which shows that the former are components of the latter. Between the well-defined types of ontogenetic reproduction there exist many intermediate types.  相似文献   

6.
In this paper, the various mathematical methods applied to taxonomy are introduced to readers. Some approaches to the classification induced by statistics, graph theory, information theory, fuzzy mathematics are discussed. An example of classification (6 OTU’s with 8 characters) is given for convenience of discussion. The original data matrix of this example is obtained from 6 species in the family of Campanulaceae.  相似文献   

7.
In this paper we give a mathematically precise formulation of an old idea in bacterial taxonomy, namely cumulative classification, where the taxonomy is continuously updated and possibly augmented as new strains are identified. Our formulation is based on Bayesian predictive probability distributions. The criterion for founding a new taxon is given a firm theoretical foundation based on prediction and it is given a clear-cut interpretation. We formulate an algorithm for cumulative classification and apply it to a large database of bacteria belonging to the family Enterobacteriaceae. The resulting taxonomy makes microbiological sense.  相似文献   

8.
分支分类问题的遗传算法   总被引:2,自引:0,他引:2  
分支分类问题可归结为聚类问题.通常的分支分类方法一般只能保证得到局部最优解.本文首先给出一种聚类方法,即同步插入法,然后将之转化为离散空间上的优化问题,并应用遗传算法以期得到全局最优解.实验结果表明该方法是正确和可行的.  相似文献   

9.
The paper presents recent trends in solventless sample preparation techniques for environmental analysis. First, a general classification of solventless methods is given. Next, three of them, treated as preferable techniques, i.e. SPME, SDME and HS, are presented in detail, with respect to their usability and effectiveness for environmental samples. Examples of all discussed techniques are given in the tables.  相似文献   

10.
The paper presents recent trends in solventless sample preparation techniques for environmental analysis. First, a general classification of solventless methods is given. Next, three of them, treated as preferable techniques, i.e. SPME, SDME and HS, are presented in detail, with respect to their usability and effectiveness for environmental samples. Examples of all discussed techniques are given in the tables.  相似文献   

11.
K Meyer 《Biometrics》1985,41(1):153-165
An algorithm is described for estimating variance and covariance components by restricted maximum likelihood for a multivariate mixed two-way classification with equal design matrices. The procedure involves a transformation to canonical scale, effectively reducing a q-variate analysis to q corresponding univariate analyses. A small numerical example is given as well as a large-scale practical application.  相似文献   

12.
This paper describes the design of a fault-tolerant classification system for medical applications. The design process follows the systems engineering methodology: in the agreement phase, we make the case for fault tolerance in diagnosis systems for biomedical applications. The argument extends the idea that machine diagnosis systems mimic the functionality of human decision-making, but in many cases they do not achieve the fault tolerance of the human brain. After making the case for fault tolerance, both requirements and specification for the fault-tolerant system are introduced before the implementation is discussed. The system is tested with fault and use cases to build up trust in the implemented system. This structured approach aided in the realisation of the fault-tolerant classification system. During the specification phase, we produced a formal model that enabled us to discuss what fault tolerance, reliability and safety mean for this particular classification system. Furthermore, such a formal basis for discussion is extremely useful during the initial stages of the design, because it helps to avoid big mistakes caused by a lack of overview later on in the project. During the implementation, we practiced component reuse by incorporating a reliable classification block, which was developed during a previous project, into the current design. Using a well-structured approach and practicing component reuse we follow best practice for both research and industry projects, which enabled us to realise the fault-tolerant classification system on time and within budget. This system can serve in a wide range of future health care systems.  相似文献   

13.
虽然网脊石耳与网脊平盘石耳因各自具不同类型子囊盘而彼此易于区别,但是,由于它们的子囊盘极为少见,而且二者的叶状体上表面都具有明显的网状褶皱,因而在实际分类工作中往往难以区分。本文作者用解剖镜和扫描电镜进行对比研究时,为该两种地衣的分类从微形态学方面提供了新的证据。  相似文献   

14.
15.

Background  

Overfitting the data is a salient issue for classifier design in small-sample settings. This is why selecting a classifier from a constrained family of classifiers, ones that do not possess the potential to too finely partition the feature space, is typically preferable. But overfitting is not merely a consequence of the classifier family; it is highly dependent on the classification rule used to design a classifier from the sample data. Thus, it is possible to consider families that are rather complex but for which there are classification rules that perform well for small samples. Such classification rules can be advantageous because they facilitate satisfactory classification when the class-conditional distributions are not easily separated and the sample is not large. Here we consider neural networks, from the perspectives of classical design based solely on the sample data and from noise-injection-based design.  相似文献   

16.
Classification is a data mining task the goal of which is to learn a model, from a training dataset, that can predict the class of a new data instance, while clustering aims to discover natural instance-groupings within a given dataset. Learning cluster-based classification systems involves partitioning a training set into data subsets (clusters) and building a local classification model for each data cluster. The class of a new instance is predicted by first assigning the instance to its nearest cluster and then using that cluster’s local classification model to predict the instance’s class. In this paper, we present an ant colony optimization (ACO) approach to building cluster-based classification systems. Our ACO approach optimizes the number of clusters, the positioning of the clusters, and the choice of classification algorithm to use as the local classifier for each cluster. We also present an ensemble approach that allows the system to decide on the class of a given instance by considering the predictions of all local classifiers, employing a weighted voting mechanism based on the fuzzy degree of membership in each cluster. Our experimental evaluation employs five widely used classification algorithms: naïve Bayes, nearest neighbour, Ripper, C4.5, and support vector machines, and results are reported on a suite of 54 popular UCI benchmark datasets.  相似文献   

17.
H P Kimmich 《Biotelemetry》1975,2(3-4):207-255
A summary of considerations for design and application of multichannel biotelemetry systems is given. The advantages but also the problems of wired, wireless, combined and storage telemetry are discussed in connection with its application. Modulation and multiplexing techniques are described extensively; however, this review focuses not only on the transmission of biological data but also on the important aspects of connection of the transmission equipment to the biological subject and display of the biological information. The topic of multichannel biotelemetry is rounded off by a few additional subjects such as telecontrol, information source and classification.  相似文献   

18.
A class of new consensus methods for n-trees (hierarchical clusterings) is proposed. These methods apply systematically to an arbitrary collection of given classifications of a fixed set of taxa, and produce a single consensus classification. They are motivated by the desire that the consensus classification retain as much information as possible from the given classifications, even in the case of only approximate agreement among them. A focus of the paper is the concept of faithfulness of consensus methods; this concept explicates the informal notion of adequate retention of information referred to above, and is proposed as a desirable requirement for consensus methods in general. The new methods are all faithful; they have the additional property that they take hierarchical level into account. Other general properties of consensus methods are investigated, especially with reference to their relation with faithfulness. The most important of these properties is neutrality; loosely speaking a consensus method is neutral if all nontrivial clusters are treated equally in the conditions on the given classifications required to guarantee the appearance of a cluster in the consensus. A central result of the paper is an analogue of the classical impossibility theorem of K. Arrow: with trivial exceptions it is impossible to have a consensus method that is simultaneously faithful and neutral. Thus two intuitively very appealing general properties of consensus methods are seen to be incompatible.  相似文献   

19.
20.
对金缕梅科现代分类系统的评述   总被引:11,自引:0,他引:11  
为了进一步研究金缕梅科的系统与进化,作者详细介绍了该科的分类历史及各个分类系统;根据现代植物系统学研究的原理和方法,着重对金缕梅科的5个现代主要分类系统,Harms(1930),张宏达(1973,1979),Bogleetal.(1980),Endress(1989)和李建华(Li,1997)进行了详细的分析、比较和评述,在此基础上提出自己的观点,认为李建华的分类系统有一定合理性,但他对个别属的处理和族的划分仍有不妥之处  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号