首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
MOTIVATION: An important application of microarrays is to discover genomic biomarkers, among tens of thousands of genes assayed, for disease classification. Thus there is a need for developing statistical methods that can efficiently use such high-throughput genomic data, select biomarkers with discriminant power and construct classification rules. The ROC (receiver operator characteristic) technique has been widely used in disease classification with low-dimensional biomarkers because (1) it does not assume a parametric form of the class probability as required for example in the logistic regression method; (2) it accommodates case-control designs and (3) it allows treating false positives and false negatives differently. However, due to computational difficulties, the ROC-based classification has not been used with microarray data. Moreover, the standard ROC technique does not incorporate built-in biomarker selection. RESULTS: We propose a novel method for biomarker selection and classification using the ROC technique for microarray data. The proposed method uses a sigmoid approximation to the area under the ROC curve as the objective function for classification and the threshold gradient descent regularization method for estimation and biomarker selection. Tuning parameter selection based on the V-fold cross validation and predictive performance evaluation are also investigated. The proposed approach is demonstrated with a simulation study, the Colon data and the Estrogen data. The proposed approach yields parsimonious models with excellent classification performance.  相似文献   

2.
The huge number of elementary flux modes in genome-scale metabolic networks makes analysis based on elementary flux modes intrinsically difficult. However, it has been shown that the elementary flux modes with optimal yield often contain highly redundant information. The set of optimal-yield elementary flux modes can be compressed using modules. Up to now, this compression was only possible by first enumerating the whole set of all optimal-yield elementary flux modes. We present a direct method for computing modules of the thermodynamically constrained optimal flux space of a metabolic network. This method can be used to decompose the set of optimal-yield elementary flux modes in a modular way and to speed up their computation. In addition, it provides a new form of coupling information that is not obtained by classical flux coupling analysis. We illustrate our approach on a set of model organisms.  相似文献   

3.
One of the problems which occurs in the development of a control system for functional electrical stimulation of the lower limbs is to detect accurately specific events within the gait cycle. We present a method for the classification of phases of the gait cycle using the artificial intelligence technique of inductive learning. Both the terminology of inductive learning and the algorithm used for the analyses are fully explained. Given a set of examples of sensor data from the gait events that are to be delected, the inductive learning algorithm is able to produce a decision tree (or set of rules) which classify the data using a minimum number of sensors. The nature of the redundancy of the sensor set is examined by progressively removing combinations of sensors and noting the effect on both the size of the decision trees produced and their classification accuracy on ‘unseen’ testing data. Since the algorithm is able to calculate which sensors are more important (informative), comparisons with the intuitive appreciation of sensor importance of five researchers in the fields were made, revealing that those sensors which appear intuitively most informative may, in fact, provide the least information. Comparison results with the standard statistical classification technique of linear discriminant analysis are also presented, showing the relative simplicity of the inductively derived rules together with their good classification accuracy. In addition to the control of FES, such techniques are also applicable to automatic gait analysis and the construction of expert systems for diagnosis of gait pathologies.  相似文献   

4.
Most of the bioinformatics tools developed for predicting mutant protein stability appear as a black box and the relationship between amino acid sequence/structure and stability is hidden to the users. We have addressed this problem and developed a human-readable rule generator for integrating the knowledge of amino acid sequence and experimental stability change upon single mutation. Using information about the original residue, substituted residue, and three neighboring residues, classification rules have been generated to discriminate the stabilizing and destabilizing mutants and explore the basis for experimental data. These rules are human readable, and hence, the method enhances the synergy between expert knowledge and computational system. Furthermore, the performance of the rules has been assessed on a nonredundant data set of 1,859 mutants and we obtained an accuracy of 80 percent using cross validation. The results showed that the method could be effectively used as a tool for both knowledge discovery and predicting mutant protein stability. We have developed a Web for classification rule generator and it is freely available at http://bioinformatics.myweb.hinet.net/irobot.htm.  相似文献   

5.
流域生态补偿是改善流域生态环境和流域水资源利用方式、推进生态文明建设的重要制度工具。制度由规则构成,流域生态补偿应该包括哪些规则以及这些规则采取何种形式会对实施补偿的最终结果产生重要影响。应用制度分析与发展(IAD)框架的规则分类讨论流域生态补偿的规则安排,旨在更加深刻地理解规则在流域生态补偿项目中发挥重要作用。介绍了作为理论基础的IAD框架应用规则,运用系统评价法对国外流域生态补偿案例的文献进行回顾,总结归纳一组成功的流域生态补偿制度所具备的特定规则,基于这组规则为我国建立可复制、可推广的流域生态补偿提供借鉴与启示,以期能将流域生态补偿的制度优势转化为流域生态资源的治理效能。  相似文献   

6.
Perdikis D  Huys R  Jirsa V 《PloS one》2011,6(2):e16589
The idea that complex motor, perceptual, and cognitive behaviors are composed of smaller units, which are somehow brought into a meaningful relation, permeates the biological and life sciences. However, no principled framework defining the constituent elementary processes has been developed to this date. Consequently, functional configurations (or architectures) relating elementary processes and external influences are mostly piecemeal formulations suitable to particular instances only. Here, we develop a general dynamical framework for distinct functional architectures characterized by the time-scale separation of their constituents and evaluate their efficiency. Thereto, we build on the (phase) flow of a system, which prescribes the temporal evolution of its state variables. The phase flow topology allows for the unambiguous classification of qualitatively distinct processes, which we consider to represent the functional units or modes within the dynamical architecture. Using the example of a composite movement we illustrate how different architectures can be characterized by their degree of time scale separation between the internal elements of the architecture (i.e. the functional modes) and external interventions. We reveal a tradeoff of the interactions between internal and external influences, which offers a theoretical justification for the efficient composition of complex processes out of non-trivial elementary processes or functional modes.  相似文献   

7.
Kinjo AR  Nakamura H 《PloS one》2012,7(2):e31437
Most biological processes are described as a series of interactions between proteins and other molecules, and interactions are in turn described in terms of atomic structures. To annotate protein functions as sets of interaction states at atomic resolution, and thereby to better understand the relation between protein interactions and biological functions, we conducted exhaustive all-against-all atomic structure comparisons of all known binding sites for ligands including small molecules, proteins and nucleic acids, and identified recurring elementary motifs. By integrating the elementary motifs associated with each subunit, we defined composite motifs that represent context-dependent combinations of elementary motifs. It is demonstrated that function similarity can be better inferred from composite motif similarity compared to the similarity of protein sequences or of individual binding sites. By integrating the composite motifs associated with each protein function, we define meta-composite motifs each of which is regarded as a time-independent diagrammatic representation of a biological process. It is shown that meta-composite motifs provide richer annotations of biological processes than sequence clusters. The present results serve as a basis for bridging atomic structures to higher-order biological phenomena by classification and integration of binding site structures.  相似文献   

8.
Goh KI  Kahng B  Cho KH 《Biophysical journal》2008,94(11):4270-4276
Various dynamic cellular behaviors have been successfully modeled in terms of elementary circuitries showing particular characteristics such as negative feedback loops for sustained oscillations. Given, however, the increasing evidences indicating that cellular components do not function in isolation but form a complex interwoven network, it is still unclear to what extent the conclusions drawn from the elementary circuit analogy hold for systems that are highly interacting with surrounding environments. In this article, we consider a specific example of genetic oscillator systems, the so-called repressilator, as a starting point toward a systematic investigation into the dynamic consequences of the extension through interlocking of elementary biological circuits. From in silico analyses with both continuous and Boolean dynamics approaches to the four-node extension of the repressilator, we found that 1), the capability of sustained oscillation depends on the topology of extended systems; and 2), the stability of oscillation under the extension also depends on the coupling topology. We then deduce two empirical rules favoring the sustained oscillations, termed the coherent coupling and the homogeneous regulation. These simple rules will help us prioritize candidate patterns of network wiring, guiding both the experimental investigations for further physiological verification and the synthetic designs for bioengineering.  相似文献   

9.
Optimal classification rules based on linear functions which maximize the Chernoff distance, or the Morisita distance, or the Kullback-Leibler distance are studied here. We obtain an expression for the optimal linear discriminant function and show that the resulting linear procedure belongs to the Anderson-Bahadur admissible class. For the comparison of discriminant rules we use some index which is the measure of the accuracy of a given class of discriminant procedures. The asymptotic form of the discriminant function is also studied.  相似文献   

10.
Y D Chen 《Biopolymers》1990,30(11-12):1113-1121
The binding of n-mer ligands to a one-dimensional lattice involving many ligand species and complex multiple-binding mechanisms is studied. We show that, when derived using the sequence-generating function method of Lifson, the secular equation of any binding system with a finite number of "elementary units" can be expressed in a matrix determinant form that is very symmetric and easy to construct. In other words, for any binding system whose elementary units are known, the secular equation of the system can be obtained readily without going through the formal derivation of the equation. We also show that the "determinant" secular equation obtained using the present procedure can be employed directly to the calculation of binding isotherms.  相似文献   

11.
In longitudinal randomised trials and observational studies within a medical context, a composite outcome—which is a function of several individual patient-specific outcomes—may be felt to best represent the outcome of interest. As in other contexts, missing data on patient outcome, due to patient drop-out or for other reasons, may pose a problem. Multiple imputation is a widely used method for handling missing data, but its use for composite outcomes has been seldom discussed. Whilst standard multiple imputation methodology can be used directly for the composite outcome, the distribution of a composite outcome may be of a complicated form and perhaps not amenable to statistical modelling. We compare direct multiple imputation of a composite outcome with separate imputation of the components of a composite outcome. We consider two imputation approaches. One approach involves modelling each component of a composite outcome using standard likelihood-based models. The other approach is to use linear increments methods. A linear increments approach can provide an appealing alternative as assumptions concerning both the missingness structure within the data and the imputation models are different from the standard likelihood-based approach. We compare both approaches using simulation studies and data from a randomised trial on early rheumatoid arthritis patients. Results suggest that both approaches are comparable and that for each, separate imputation offers some improvement on the direct imputation of a composite outcome.  相似文献   

12.
In this paper, a method for automatic construction of a fuzzy rule-based system from numerical data using the Incremental Learning Fuzzy Neural (ILFN) network and the Genetic Algorithm is presented. The ILFN network was developed for pattern classification applications. The ILFN network, which employed fuzzy sets and neural network theory, equips with a fast, one-pass, on-line, and incremental learning algorithm. After trained, the ILFN network stored numerical knowledge in hidden units, which can then be directly interpreted into if then rule bases. However, the rules extracted from the ILFN network are not in an optimized fuzzy linguistic form. In this paper, a knowledge base for fuzzy expert system is extracted from the hidden units of the ILFN classifier. A genetic algorithm is then invoked, in an iterative manner, to reduce number of rules and select only discriminate features from input patterns needed to provide a fuzzy rule-based system. Three computer simulations using a simulated 2-D 3-class data, the well-known Fisher's Iris data set, and the Wisconsin breast cancer data set were performed. The fuzzy rule-based system derived from the proposed method achieved 100% and 97.33% correct classification on the 75 patterns for training set and 75 patterns for test set, respectively. For the Wisconsin breast cancer data set, using 400 patterns for training and 299 patterns for testing, the derived fuzzy rule-based system achieved 99.5% and 98.33% correct classification on the training set and the test set, respectively.  相似文献   

13.
MOTIVATION: The classification of samples using gene expression profiles is an important application in areas such as cancer research and environmental health studies. However, the classification is usually based on a small number of samples, and each sample is a long vector of thousands of gene expression levels. An important issue in parametric modeling for so many gene expression levels is the control of the number of nuisance parameters in the model. Large models often lead to intensive or even intractable computation, while small models may be inadequate for complex data.Methodology: We propose a two-step empirical Bayes classification method as a solution to this issue. At the first step, we use the model-based cluster algorithm with a non-traditional purpose of assigning gene expression levels to form abundance groups. At the second step, by assuming the same variance for all the genes in the same group, we substantially reduce the number of nuisance parameters in our statistical model. RESULTS: The proposed model is more parsimonious, which leads to efficient computation under an empirical Bayes estimation procedure. We consider two real examples and simulate data using our method. Desired low classification error rates are obtained even when a large number of genes are pre-selected for class prediction.  相似文献   

14.
The process of knowledge discovery from big and high dimensional datasets has become a popular research topic. The classification problem is a key task in bioinformatics, business intelligence, decision science, astronomy, physics, etc. Building associative classifiers has been a notable research interest in recent years because of their superior accuracy. In associative classifiers, using under-sampling or over-sampling methods for imbalanced big datasets reduces accuracy or increases running time, respectively. Hence, there is a significant need to create efficient associative classifiers for imbalanced big data problems. These classifiers should be able to handle challenges such as memory usage, running time and efficiently exploring the search space. To this end, efficient calculation of measures is a primary objective for associative classifiers. In this paper, we propose a new efficient associative classifier for big imbalanced datasets. The proposed method is based on Rare-PEARs (a multi-objective evolutionary algorithm that efficiently discovers rare and reliable association rules) and is able to evaluate rules in a distributed manner by using a new storing data format. This format simplifies measures calculation and is fully compatible with the MapReduce programming model. We have applied the proposed method (RPII) on a well-known big dataset (ECBDL’14) and have compared our results with seven other learning methods. The experimental results show that RPII outperform other methods in sensitivity and final score measures (the values of sensitivity and final score measures were approximately 0.74 and 0.54 respectively). The results demonstrate that the proposed method is a good candidate for large-scale classification problems; furthermore, it achieves reasonable execution time when the target platform is a typical computer clusters.  相似文献   

15.
This paper is concerned with biological regulatory mechanisms in response to the simultaneous occurrence of a huge number of environmental changes. The restricted resources of cells strictly limit the number of their regulatory methods; hence, cells must adopt, as compensation, special mechanisms to deal with the simultaneous occurrence of environmental changes. We hypothesize that cells use various control logics to integrate information about independent environmental changes related to a cell task and represent the resulting effects of the different ways of integration by logical functions. Using the notion of equivalence classes in set theory, we describe the mathematical classification of the effects into biologically unequivalent ones realized by different control logics. Our purely mathematical and systematic classification of logical functions reveals three elementary control logics with different biological relevance. To better understand their biological significance, we consider examples of biological systems that use these elementary control logics.  相似文献   

16.
The most widely used statistical methods for finding differentially expressed genes (DEGs) are essentially univariate. In this study, we present a new T(2) statistic for analyzing microarray data. We implemented our method using a multiple forward search (MFS) algorithm that is designed for selecting a subset of feature vectors in high-dimensional microarray datasets. The proposed T2 statistic is a corollary to that originally developed for multivariate analyses and possesses two prominent statistical properties. First, our method takes into account multidimensional structure of microarray data. The utilization of the information hidden in gene interactions allows for finding genes whose differential expressions are not marginally detectable in univariate testing methods. Second, the statistic has a close relationship to discriminant analyses for classification of gene expression patterns. Our search algorithm sequentially maximizes gene expression difference/distance between two groups of genes. Including such a set of DEGs into initial feature variables may increase the power of classification rules. We validated our method by using a spike-in HGU95 dataset from Affymetrix. The utility of the new method was demonstrated by application to the analyses of gene expression patterns in human liver cancers and breast cancers. Extensive bioinformatics analyses and cross-validation of DEGs identified in the application datasets showed the significant advantages of our new algorithm.  相似文献   

17.
The life form is a generalized morphoecological characteristic of an animal giving an idea of the organism as a whole, its position and function and functional role in the ecosystem. This characteristic is inherent to a species or to a group of congeneric species (for applied goal it is better to use a genus, not species) considered in the framework of higher taxon (from family to type). The principal contradiction of life form concept is determined by the existence of ontogenetic stages and changes of life forms during the whole life of individual. It is usually assumed that the concept of life form should be applied only to the adult stage, thus ignoring the integral character of the life cycle as indivisible unit of selection, evolution and functioning in ecosystem. We propose that a morphologically specific ontogenetic of a given species should be used as an elementary lowest unit in the classification of life forms. Thus it can be considered as integrated internally structured morphoecological unit in time and multidimensional space of abiotic and biotic environmental factors. As an example we describe the types of reproductive strategies and classification of elementary (ontogenetic) life forms in cephalopods. We present characteristics of the life cycle of some typical cephalopod species inhabiting different biotopes and having different models of locomotion, feeding, reproduction and development.  相似文献   

18.
Categorization of biomedical articles is a central task for supporting various curation efforts. It can also form the basis for effective biomedical text mining. Automatic text classification in the biomedical domain is thus an active research area. Contests organized by the KDD Cup (2002) and the TREC Genomics track (since 2003) defined several annotation tasks that involved document classification, and provided training and test data sets. So far, these efforts focused on analyzing only the text content of documents. However, as was noted in the KDD'02 text mining contest-where figure-captions proved to be an invaluable feature for identifying documents of interest-images often provide curators with critical information. We examine the possibility of using information derived directly from image data, and of integrating it with text-based classification, for biomedical document categorization. We present a method for obtaining features from images and for using them-both alone and in combination with text-to perform the triage task introduced in the TREC Genomics track 2004. The task was to determine which documents are relevant to a given annotation task performed by the Mouse Genome Database curators. We show preliminary results, demonstrating that the method has a strong potential to enhance and complement traditional text-based categorization methods.  相似文献   

19.
20.
We present a method for building systematics when new knowledge is continuously accumulated. The resulting classification is self-correcting and improves itself by sorting new items as they are added to the material and studied. The formulation is based on Bayesian predictive probability distributions. A new item that has not yet been classified is assigned to the class that has maximal posterior probability or is made to form a group of its own. Such a cumulative classification depends on the order in which the items are classified. The introduction of an already classified training set considerably improves the repeatability of the method. As a case study we applied the method to a large data set for the Enterobacteriaceae. The resulting classifications corresponded well to the general structure of the prevailing taxonomy of Enterobacteriaceae.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号