首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
M Aickin 《Biometrics》1990,46(2):293-302
The alpha agreement parameter is defined as the proportion of a population of items that are classified identically "for cause" by two classifiers, the remaining items being classified at random. The parameters of the corresponding constant predictive probability model are shown to be estimable by the method of maximum likelihood, and a simulation study indicates applicability of the asymptotic results to finite samples. The new estimator tends to be larger than Cohen's kappa, except in the case of uniform margins. An application is made to the validity of cancer risk items included in a cancer registry.  相似文献   

2.
Based on the 210 non-homologous proteins (domains) classified manually by Michie et al. (J. Mol. Biol. 262, 168-185, 1996), a new structure classification criterion of globular proteins relying on the content of helix/strand has been proposed, using a quadratic discriminant method. Each protein is classified into one of the three classes, i.e. those of alpha class, beta class and alphabeta class (including alpha/beta and alpha+beta classes). According to the new structure classification criterion, of the 210 proteins in the training set, 207 are correctly classified and thus the accuracy is 207/210=98.57%. Multiple cross-validation tests are performed. The jackknife test shows that of the 210 proteins 207 are correctly classified with an accuracy of 98.57%. To test the method further, of 3577 proteins (domains) extracted from SCOP, 91.39% of them are correctly reclassified by the new classification criterion. On average, the accuracy of the new criterion is about 8 percentage points higher than that of the criterion proposed by Nakashima et al. (J. Biochem. 99, 153-162, 1986). Our result shows that the classification based solely on structures is basically consistent with that combining both structural and evolutionary information. Further complete automated classification scheme should consider both structures and evolutionary relationship. The methodology presented provides an appropriate mathematical format to reach this goal.  相似文献   

3.
Tomography emerges as a powerful methodology for determining the complex architectures of biological specimens that are better regarded from the structural point of view as singular entities. However, once the structure of a sufficiently large number of singular specimens is solved, quite possibly structural patterns start to emerge. This latter situation is addressed here, where the clustering of a set of 3D reconstructions using a novel quantitative approach is presented. In general terms, we propose a new variant of a self-organizing neural network for the unsupervised classification of 3D reconstructions. The novelty of the algorithm lies in its rigorous mathematical formulation that, starting from a large set of noisy input data, finds a set of "representative" items, organized onto an ordered output map, such that the probability density of this set of representative items resembles at its possible best the probability density of the input data. In this study, we evaluate the feasibility of application of the proposed neural approach to the problem of identifying similar 3D motifs within tomograms of insect flight muscle. Our experimental results prove that this technique is suitable for this type of problem, providing the electron microscopy community with a new tool for exploring large sets of tomogram data to find complex patterns.  相似文献   

4.
In the context of brain-computer interface (BCI) system, the common spatial patterns (CSP) method has been used to extract discriminative spatial filters for the classification of electroencephalogram (EEG) signals. However, the classification performance of CSP typically deteriorates when a few training samples are collected from a new BCI user. In this paper, we propose an approach that maintains or improves the recognition accuracy of the system with only a small size of training data set. The proposed approach is formulated by regularizing the classical CSP technique with the strategy of transfer learning. Specifically, we incorporate into the CSP analysis inter-subject information involving the same task, by minimizing the difference between the inter-subject features. Experimental results on two data sets from BCI competitions show that the proposed approach greatly improves the classification performance over that of the conventional CSP method; the transformed variant proved to be successful in almost every case, based on a small number of available training samples.  相似文献   

5.
Accurately estimating probabilities from observations is important for probabilistic-based approaches to problems in computational biology. In this paper we present a biologically-motivated method for estimating probability distributions over discrete alphabets from observations using a mixture model of common ancestors. The method is an extension of substitution matrix-based probability estimation methods. In contrast to previous such methods, our method has a simple Bayesian interpretation and has the advantage over Dirichlet mixtures that it is both effective and simple to compute for large alphabets. The method is applied to estimate amino acid probabilities based on observed counts in an alignment and is shown to perform comparably to previous methods. The method is also applied to estimate probability distributions over protein families and improves protein classification accuracy.  相似文献   

6.
由于基因表达数据高属性维、低样本维的特点,Fisher分类器对该种数据分类性能不是很高。本文提出了Fisher的改进算法Fisher-List。该算法独特之处在于为每个类别确定一个决策阀值,每个阀值既包含总体样本信息,又含有某些对分类至关重要的个体样本信息。本文用实验证明新算法在基因表达数据分类方面比Fisher、LogitBoost、AdaBoost、k-近邻法、决策树和支持向量机具有更高的性能。  相似文献   

7.
Accurately estimating probabilities from observations is important for probabilistic-based approaches to problems in computational biology. In this paper we present a biologically-motivated method for estimating probability distributions over discrete alphabets from observations using a mixture model of common ancestors. The method is an extension of substitution matrix-based probability estimation methods. In contrast to previous such methods, our method has a simple Bayesian interpretation and has the advantage over Dirichlet mixtures that it is both effective and simple to compute for large alphabets. The method is applied to estimate amino acid probabilities based on observed counts in an alignment and is shown to perform comparably to previous methods. The method is also applied to estimate probability distributions over protein families and improves protein classification accuracy.  相似文献   

8.
逻辑学原理是各种分类系统科学性及规范性的必要检验工具。本文采用逻辑学原理检验基于优势种的《中国植被》的植被分类系统, 结果发现目前常用的植被分类系统存在较多逻辑错误, 需要予以纠正。于是, 在强调植物生活型分类系统和植被分类系统一致性的基础上, 依据逻辑学原理给出建立植被分类系统的步骤和方法, 提出规范的植物生活型分类系统和植被分类系统示例方案。鉴于多建群种植被的客观存在及其存在形式多样, 在分类系统中给出相应的位置——多建群种植被纲。同时, 针对国内植被分类学界从未形成统一的植被命名规则, 且又有多种命名方式并存的现状, 提出了函数命名法。  相似文献   

9.
10.
Ranked set sampling is a method which may be used to increase the efficiency of the estimator of the mean of a population. Ranked set sampling with size biased probability of selection (i.e., the items are selected with probability proportion to its size) is combined with the line intercept method to increase the efficency of estimating cover, density and total amount of some variable of interest (e.g. biomass). A two-stage sampling plan is suggested with line intercept sampling in the first stage. Simple random sampling and ranked set sampling are compared in the second stage to show that the unbiased estimators of density, cover and total amount of some variable of interest based on ranked set sampling have smaller variances than the usual unbiased estimator based on simple random sampling. Efficiency is increased by reducing the number of items which are measured on a transect or by increasing the number of independent transects utilized in a study area. An application procedure is given for estimation of coverage, density and number of stems of mountain mahogany (Cercocarpus montanus) in a study area east of Laramie, Wyoming.  相似文献   

11.
A new method has been developed to compute the probability that each amino acid in a protein sequence is in a particular secondary structural element. Each of these probabilities is computed using the entire sequence and a set of predefined structural class models. This set of structural classes is patterned after Jane Richardson''s taxonomy for the domains of globular proteins. For each structural class considered, a mathematical model is constructed to represent constraints on the pattern of secondary structural elements characteristic of that class. These are stochastic models having discrete state spaces (referred to as hidden Markov models by researchers in signal processing and automatic speech recognition). Each model is a mathematical generator of amino acid sequences; the sequence under consideration is modeled as having been generated by one model in the set of candidates. The probability that each model generated the given sequence is computed using a filtering algorithm. The protein is then classified as belonging to the structural class having the most probable model. The secondary structure of the sequence is then analyzed using a "smoothing" algorithm that is optimal for that structural class model. For each residue position in the sequence, the smoother computes the probability that the residue is contained within each of the defined secondary structural elements of the model. This method has two important advantages: (1) the probability of each residue being in each of the modeled secondary structural elements is computed using the totality of the amino acid sequence, and (2) these probabilities are consistent with prior knowledge of realizable domain folds as encoded in each model. As an example of the method''s utility, we present its application to flavodoxin, a prototypical alpha/beta protein having a central beta-sheet, and to thioredoxin, which belongs to a similar structural class but shares no significant sequence similarity.  相似文献   

12.
We propose a method for a posteriori evaluation of classification stability which compares the classification of sites in the original data set (a matrix of species by sites) with classifications of subsets of its sites created by without‐replacement bootstrap resampling. Site assignments to clusters of the original classification and to clusters of the classification of each subset are compared using Goodman‐Kruskal's lambda index. Many resampled subsets are classified and the mean of lambda values calculated for the classifications of these subsets is used as an estimation of classification stability. Furthermore, the mean of the lambda values based on different resampled subsets, calculated for each site of the data set separately, can be used as a measure of the influence of particular sites on classification stability. This method was tested on several artificial data sets classified by commonly used clustering methods and on a real data set of forest vegetation plots. Its strength lies in the ability to distinguish classifications which reflect robust patterns of community differentiation from unstable classifications of more continuous patterns. In addition, it can identify sites within each cluster which have a transitional species composition with respect to other clusters.  相似文献   

13.
森林立地分类是林业生产的重要基础。林业发达国家对此都相继开展了深入细致的研究,许多研究成果已在林业实践中发挥了重要作用。我国森林立地分类研究始于50年代,其间几经中断,近年来又日益受到重视,在“七五”期间被列为国家科技重点(攻关)项目。无疑,这对我国林业现代化将起到重要的推动作用。森林立地分类与评价的方法各异,流派众多,对此作者曾有详细的评介。随着数学  相似文献   

14.
Data transformations prior to analysis may be beneficial in classification tasks. In this article we investigate a set of such transformations on 2D graph-data derived from facial images and their effect on classification accuracy in a high-dimensional setting. These transformations are low-variance in the sense that each involves only a fixed small number of input features. We show that classification accuracy can be improved when penalized regression techniques are employed, as compared to a principal component analysis (PCA) pre-processing step. In our data example classification accuracy improves from 47% to 62% when switching from PCA to penalized regression. A second goal is to visualize the resulting classifiers. We develop importance plots highlighting the influence of coordinates in the original 2D space. Features used for classification are mapped to coordinates in the original images and combined into an importance measure for each pixel. These plots assist in assessing plausibility of classifiers, interpretation of classifiers, and determination of the relative importance of different features.  相似文献   

15.
The international validation study on alternative methods to replace the Draize rabbit eye irritation test, funded by the European Commission (EC) and the British Home Office (HO), took place during 1992-1994, and the results were published in 1995. The results of this EC/HO study are analysed by employing discriminant analysis, taking into account the classification of the in vivo data into eye irritation classes A (risk of serious damage to eyes), B (irritating to eyes) and NI (non-irritant). A data set for 59 test items was analysed, together with three subsets: surfactants, water-soluble chemicals, and water-insoluble chemicals. The new statistical methods of feature selection and estimation of the discriminant functions classification error were used. Normal distributed random numbers were added to the mean values of each in vitro endpoint, depending on the observed standard deviations. Thereafter, the reclassification error of the random observations was estimated by applying the fixed function of the mean values. Moreover, the leaving-one-out cross-classification method was applied to this random data set. Subsequently, random data were generated r times (for example, r = 1000) for a feature combination. Eighteen features were investigated in nine in vitro test systems to predict the effects of a chemical in the rabbit eye. 72.5% of the chemicals in the undivided sample were correctly classified when applying the in vitro endpoints lgNRU of the neutral red uptake test and lgBCOPo5 of the bovine opacity and permeability test. The accuracy increased to 80.9% when six in vitro features were used, and the sample was subdivided. The subset of surfactants was correctly classified in more than 90% of cases, which is an excellent performance.  相似文献   

16.
In this paper we give a mathematically precise formulation of an old idea in bacterial taxonomy, namely cumulative classification, where the taxonomy is continuously updated and possibly augmented as new strains are identified. Our formulation is based on Bayesian predictive probability distributions. The criterion for founding a new taxon is given a firm theoretical foundation based on prediction and it is given a clear-cut interpretation. We formulate an algorithm for cumulative classification and apply it to a large database of bacteria belonging to the family Enterobacteriaceae. The resulting taxonomy makes microbiological sense.  相似文献   

17.
A study was undertaken to confirm earlier work on a smaller number of patients that had suggested that medium-resolution contextual analysis complements high-resolution individual cell analysis for cytomorphometric classification of fine needle aspirate smears of breast. The objectives of this study were to improve and verify the method. Sixty-one biopsy-confirmed hematoxylin and eosin-stained aspirate smears of breast were restained using the Feulgen technique. Individual nuclei were digitized at a resolution of 0.25 micron. Features describing size, shape, density and texture were extracted from the images. Individual cell analysis correctly classified 84% of cases, contextual analysis correctly classified 70% of cases, and the combined use of both techniques resulted in 87% classification accuracy. However, if fibroadenoma cases are excluded, the combined correct classification rate is 93%. Geometric and densitometric features contributed most to correct classification in individual cell analysis, while the most important contextual feature was the number of clusters per scene. We conclude that the addition of quantitative measures of smear patterns, termed "contextual analysis," improves automated classification schemes.  相似文献   

18.
A multidimensional slit-scan flow system has been developed to serve as an automated prescreening instrument for gynecological cytology. Specimens are classified abnormal based on the number of cells having elevated nuclear fluorescence (alarms). An alarm region in a bivariate histogram of nuclear fluorescence versus nuclear-to-cell-diameter ratio is defined. Alarm region probability arrays are calculated to estimate the probability that an alarm falling in a particular bin of the alarm region is either from a normal or an abnormal specimen. From these arrays, a weighted alarm index is generated. In addition, summary indices are derived that measure how the distribution of alarms in each specimen compares with the average distributions for the normal and abnormal specimen populations. These indices together with current features are evaluated with respect to their utility in specimen classification using a nonparametric classification technique known as recursive partitioning. Resulting classification trees are presented that suggest information in the distribution of alarms in the bivariate histogram. In addition, they validate the features and rules currently used for specimen classification. Recursive partitioning appears to be useful for multivariate classification and is seen as a promising technique for other applications.  相似文献   

19.
Mapping multiple Quantitative Trait Loci by Bayesian classification   总被引:2,自引:0,他引:2       下载免费PDF全文
Zhang M  Montooth KL  Wells MT  Clark AG  Zhang D 《Genetics》2005,169(4):2305-2318
We developed a classification approach to multiple quantitative trait loci (QTL) mapping built upon a Bayesian framework that incorporates the important prior information that most genotypic markers are not cotransmitted with a QTL or their QTL effects are negligible. The genetic effect of each marker is modeled using a three-component mixture prior with a class for markers having negligible effects and separate classes for markers having positive or negative effects on the trait. The posterior probability of a marker's classification provides a natural statistic for evaluating credibility of identified QTL. This approach performs well, especially with a large number of markers but a relatively small sample size. A heat map to visualize the results is proposed so as to allow investigators to be more or less conservative when identifying QTL. We validated the method using a well-characterized data set for barley heading values from the North American Barley Genome Mapping Project. Application of the method to a new data set revealed sex-specific QTL underlying differences in glucose-6-phosphate dehydrogenase enzyme activity between two Drosophila species. A simulation study demonstrated the power of this approach across levels of trait heritability and when marker data were sparse.  相似文献   

20.
We discuss the taxonomy of Enterobacteriaceae in the light of classification by minimization of stochastic complexity (SC). A classification which minimizes SC is optimal from the point of view of information theory. It was found that the SC-minimizing classification of a large database of strains of Enterobacteriaceae resulted in structures which correspond well to the conclusions of experts on the taxonomy of Enterobacteriaceae. The approach based on minimization of SC can therefore be considered as useful in bacterial taxonomy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号